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hierarchy involved an orderly progression from a concept involving 
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dimensions (concept 3) . This established the basis for computing 
mastery evaluation cut rules on the basis of the model. Reliable ( 
differences occurred for training level and for concept difficulty, 
but not for test length or item types. The results of the validity 
analysis were, in general, favorable to the model. It is thus 
concluded that the proposed model is reasonably valid. This evidence 
could be used as a basis for a demonstration or experimental 
implementation of the model in an educational environment that uses 
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SUMMARY 



The research described in this report was designee! to investigate 
experimentally the validity of an evaluation model for mastery testing 
applications. This model is based on the assumptions that: ( 1. ) the 

learning oi fundamental skills can be considered all or none; { 2 } curb 
item response on a single skill test represents an unbiased sample ol 
Lite examinee's true mastery status; (3) measurement error occurring on 
the test (as estimated from the average interitem correlation) can be ol 
only one type (or or 0) for each examinee; and ( 4 ) through practical and 
theoretical considerations of evaluation error costs and item error 
characteristics, an optimal mastery criterion can be calculated. JSucii 
of these assumptions is discussed, and the resultant mastery criteria 
algorithm is presented in a form amenable to experimental validation. 

Operationalization of the key parameters of the model was accom- 
plished through use of a train-test design in a concept attainment 
experiment. Two alternative models of hierarchical concept attainment: 
were used as the basis of materials development and sequencing. The 
mastery probability parameter was manipulated by varying the extent ol 
training Ss reviewed at each step in the attainment hierarchy. The 
item error parameter was manipulated by varying the number of alterna- 
tives (choices) in the test item. The length parameter was manipulated 
by using tests of varying length. 

The experiment was conducted on 96 third grade Ss. Each S received 
familiarizations training and then proceeded through the train-test 
sequences of the experiment. A given S was assigned to either of tv.o 
train-transfer paradigms ( intradimensional shift versus no-shiJl) under 
one of three levels of training (low, moderate, high). The moderate 
training level was the theoretical average number of trials required 
to attain the concept. The concept hierarchy involved an orderly pro- 
gression from a concept involving one relevant oi three varying dimen- 
sions through two relevant of four varying dimensions (concept 2) to 
four relevant of six varying dimensions (concept 3). 

Immediately following training, Ss received concept attainment 
tests in the iorm of blank trials. A given S was assigned to either 
continuous versus terminal testing, and was given either five or 10 
item tests that wore either two or four choice. 
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Three groups of analyses were performed: training, testing, and 

model validation. The results of the analysis of training data vali- 
dated the assumptions underlying the levels manipulation, and provided 
support for a cumulative component learning and transfer paradigm (the 
no-shift sequence). Performance also was shown to interact with the 
level of training in terms of initial trial solution biases. When this 
bias was controlled for, the hypothesized outcomes become more apparent. 

Analysis of test data failed to reveal any systematic performance 
differences other than those observed as a function of training manipu- 
lations. That is, reliable differences occurred for training level and 
for concept difficulty, but not for test length or item type. This 
concurrence of training and test results established the basis for im- 
plementation of the mastery evaluation algorithm and for subsequent 
assessment of its operational validity. 

The results of this validity analysis were, in general, favorable 
to the model. For example, the assumption that the cut rules are opti- 
mal, for given measurement and decision error constraints, was supported 
in that the optimization ratio was computed to be 0.95. It also is 
shown that the overall test distributions for five and 10 item tests 
separate into the respective mastery and nonmastery components quite 
differently on individual than on aggregated bases. Furthermore, She 
theoretical and empirical distributions of mastery and nonmastery scores 
show reasonably good fit for the five item tests, although not so sood 
for the longer tests. 

It is thus concluded that the experimental evidence provided by 
this research is reasonably supportive of the validity of the proposed 
mastery evaluation model. This evidence could be used as a basis for 
a demonstration or experimental implementation of the model in an edu- 
cational environment that uses mastery, or criterion referenced, evalua- 
tion procedures. 
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Among the more exciting and promising trends currently emerging 
v\ith educational innovations and reforms is c, shift from traditional 
classroom instruction with its norm- referenced testing procedures to 
more individualized instructional systems based on criterion referenced 
test procedures (e.g. , Block,' 1971). One particularly successful sys- 
tem is known as Individually Prescribed Instruction (iPl). It places 
emphasis on materials and methods of instruction, matching these with 
level of achievement, past progress, and perhaps learning style of the 
student in generating a work assignment or "prescription" (Cooley and 
Glaser, 1969; Glaser, 1967). However , as with other individualized sys- 
tems, IPI is critically dependent on the existence of a reliable and 
valid measurement model to indicate when the student has attained each 
skill-mastery state. 

It is not difficult to show that the traditional measurement pro- 
cedures are inadequate, or at best arbitrary as a method of identifying 
student skill mastery. For example, using criterion referenced proce- 
dures, IPI has suggested an 85 percent correct minimum as a mastery cri- 
terion for any skill test (of which there are more than 400). Although 
this criterion does have intuitive appeal, there is no convenient ana- 
lytical or empirical justification for it. Just as various skills may 
differ in level of difficulty in terms of mastery, so also might the 
optimal performance criteria in the test situation vary. It may be that 
for some skills, a test score of 60 percent is indicative of mastery, 
whereas for others a score of 90 percent or higher would be required. 

In brief, the issue is not whether a criterion referenced testing pro- 
cedure is or is not appropriate to IPI, but rather how ana at what level 
each criterion should be set. 

To anchor the skill- testing procedure to the operations and out- 
comes ol' individualized instructional technology, a skill mastery test 
model is proposed (Emrick, 1971) in which both item and student informa- 
tion are combined, yielding probability statements regarding skill- 
mastery status. This model is particularly attractive in that the as- 
sumptions are few and simple, and it provides for empirical determina- 
tion of the most critical parameters — namely, the item measurement error 
likelihoods. Furthermore, the generation of a test cut rule or mastery 
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criterion is provided by an algorithm that is based on both the test 
properties and a cost-benefit analysis of decision errors. 



Model 

The basic working assumptions of this mastery test model are as 
follows : 

(1) Appropriately defined educational objectives will consist of, 
or can generally be analyzed into, collections of unitary and 
explicitly defined (in terms of performance criteria) skills 
(Gagne, 1932). For each of these skills, mastery will be a 
binary (all or none) variable. Thus, for an educational ob- 
jective to be mastered completely, all component skills must 
be mastered. Further, the degree or "level" of master J of the 
objective will be determined by the proportion or number c*f 
these component skills that are mastered. 

(2) Tests designed to assess mastery of the component skill? 
within an objective will each consist of collections of test 
items that are highly homogeneous in terms of content, form, 
and difficulty level. Thus, within a single skill test, each 
item response provides an unbiased estimate of the examinee* s 
mastery status with respect to that skill. 

(3) Since mastery of each unitary skill is assumed to be an all or 
none variable, the measurement error for tx gi v ^;n examinee on 

a single such skill will be of only cl two types; (l) 
type 1 or alpha (o'), in which the examinee’s responses lead to 
a mastery conclusion when his true status is nonmastery; or 
(2) type 2 or beta (3), in which his item responses lead to a 
nonmastery conclusion when in fact he has mastered the skill 
in question. Stated differently, an examinee can occupy only 
one status with respect to the skill being tested — mastery or 
nonmastery. Test item responses that correspond to the exam- 
inee’s true status are, by definition, valid (i.e., mastery 
students "pass" the item and nonmastery students "fail" the 
item). Test item responses that do not correspond to the 
examinee’s true status are, also by definition, measurement 
error (i.e., "lucky guesses" and so forth for nonmastcry exam- 
inees and "careless errors" and so forth for mastery exam- 
inees). This situation is represented in Table 1. 



Table 1 



Item Response Contingencies as a Function of Learning State 
on a Single Skill Mastery Test 



True Learning 
State 



mastery 

(si) 



nonmastery 

(m) 



Observed Response 

wrong correct 
(w) (c) 



3 


1-3 


1-cr 


a 



Of = The probability of a correct response from a 
nonmastcry student (type 1 error) 

3 = The probability of a wrong response from a 
mastery student (type 2 error) 

(l-a) = The probability of a valid wrong response 
(l-3) = The probability of a valid correct response 



( l) The extent of measurement error in a single skill test can be 
approximated by calculating the average interitem correlation 
of examinee responses to the parallel and homogeneous test 
items. This average interitem correlation provides an un- 
biased estimate of the reliability of a single item of a uni- 
tary skill test. Since reliability is defined as the propor- 
tion of total variance that is "true" variance (Lord and 
No vick f 1968; Ghcsclli, 1964), this average interitem correla- 
tion can be interpreted as an unbiased estimate of the squared 
correlation between an examinee's true mastery state and his 
item response. 

This correlation between mastery state and item response can 
further be interpreted in terms of the two classes of measure- 
ment error {at and 3) with reference to Table 1. The response 
contingencies from this four- fold table are calculated in the 
form of a phi ($) coefficient, indicating the correlation 
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between observed responses and "true 1 ’ score. One formula for 
computing 0 from Table 1 is: 



1 - Ce* - 3 
A - (a - 3) 2 



( 1 ) 



Since reliability is defined as the squared correlation 
between fallible and true scores, and since the above expres- 
sion represents this correlation on the item level, the square 
of equation (l) is the expression for item reliability on a 
single skill mastery test, 

( 5 ) Because of the presence of at least some measurement error, 
decision errors correspondingly will accrue regarding deter- 
mination of examinee status on the skills being measured. A 
decision- theoretic approach to this problem (Chernof and Moses, 
1959) suggests regret resulting from these evaluation errors 
can be minimized through a cost-benefit analysis of the vari- 
ables that comprise the evaluative process. Three classes of 
these variables are: 

• Statistical, such as item reliability anri test length. As 
item scores become more reliable, tests of fixed length 
will yield proportionally fewer evaluative errors. Simi- 
larly, for a given item reliability, increasing test length 
by adding parallel items will operate to differentiate more 
clearly mastery from nonmastcry examinees (Emrick and Adams, 

1969) . However, it is not completely clear what the costs 
involved in improving item reliability would be, or, aside 
from following principles of item construction (Bovmuth, 

1970) , how one actually would manipulate item reliability. 
Further, increase in overall test reliability is a decreas- 
ing function of increase in item reliability or test length 
(Lord and Novick, 1968). 

• The second class of variables is described as curricular. 
Specific instructional objectives arc seen to vary with 
respect to the importance they occupy in differing instruc- 
tional models. For objectives viewed as ancillary to the 
model, evaluative regret will be low or irrelevant. How- 
ever, for objectives viewed as fundamental to the model and 
prerequisite to further learning, regret can become sizable. 
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This regret will accrue from the two types of evaluative 
(decision) error: (l) Type I — or a "false pass" comprising 

costs of reduced efficiency in mastery of subsequent ob- 
jectives due to nonmaster; v of prerequisite, and if the 
learner eventually "bogs down," the costs of diagnosis and 
remediation; and ( 2 ) Type II — or a "false fail" comprising 
costs of unnecessary exercises, materials, and instruc- 
tional time given this mastery student, as well as the 
costs of subsequently retesting him, 

• The third class of factors that enters into regret is in 
terms of psychological costs resulting from decision er- 
rors. For example, regret for Type I evaluative errors 
would consist of psychological costs to the "out of track" 
learner, such as confusion, suboptimal success rate, and 
the like. The Type II regret would include psychological 
costs such as boredom, decreased sense of achievement, 
lower motivation , and so forth. 

If meaningful quantitative values could be independently assigned 
to each of these cost factors, as well as to Of and 3 item error proba- 
bilities, then the generation of optimal mastery criteria for a given 
test would be straightforward. But since no such values are conve- 
niently available (nor are they likely to be in the foreseeable future), 
Emrick (l97l) and Emrick and Adams (1969) have proposed that mastery 
cutoff scores be optimized in terms of relative decision error costs 
and relative item error probabilities. Hence the optimization formula: 



K 



log 



1 - a 



log 



+ 1 n(log RR) 
*3 



(l - Qf) ( l - 3) 



( 2 ) 



where 

K = the cut point expressed as a percent score on the test 
Of = estimated probability of Type I item error 
S = estimated probability of Type II item error 
RR = ratio of regret of Type II to Type I decision errors 
n = test length (number of items). 



The determination of the values that enter into equation ( 2 ) fol- 
lows from the above described assumption of the model. Specifically, 
the total probability of item measurement error (a + 3) is estimated as 
one minus the square root of the average interitem correlation (i,e., 

1 - V r. ). A logical analysis of item form yields an estimate of which 
type of error predominates. For example, a true-false test should yield 
relatively more Oi than 3 errors, whereas for recall or completion items, 
the reverse should be true. Using these estimates of the sum and ratio 
of the errors, an estimate of each error component can be obtained. 

A similar procedure is used to supply a value for the ratio of 
regret (RR). Specifically, a logical analysis of the evaluative proce- 
dure will yield an estimate of the more costly decision error, possible 
in conjunction with some estimate of the examinee’s status before test- 
ing. In some testing situations, false fail errors may be considered 
far more costly than false passes whereas in other cases the reverse may 
be true. Also, there may occur cases where the two costs are judged 
essentially equal. Actually, many teachers operate in this fashion, 
deciding to err either on the "high" or "low" side, depending on the 
skill as well as the examinee being evaluated. These estimates are 
operationalized in equation ( 2 ) as RR. Finally, by indicating the test 
length (n) in addition to the above values, tables of optimal mastery 
criteria can be generated for virtually any single skills test. 

The goals of this research were to validate empirically this eval- 
uation approach to mastery testing. To accomplish this, it is necessary 
to establish — and to some extent quantify — manipulations among the rele- 
vant parameters of the model. These parameters are: 

• Mastery probability, or the probability that the student has at- 
tained mastery of the skill at the time of testing. 

• Item error likelihood, or the relative probabilities of or (false 
pass) and 9 (false fail) measurement error occurring in the test. 

• The length of the test. 

The first of these parameters is the most difficult to directly 
establish, s i^ce it must be derived on the basis of a model of learning, 
or inferred on the basis of performance. The procedure adopted in this 
research was to consolidate Gagne 1 s "acquisi tion of knowledge" hierar- 
chical learning model — versus Neisser and Ween 1 s logical complexity 
model — with Trabasso and Bower’s "discrimination-attention" learning 
assumptions in the form of typo of problems and level of training in 
concept identification tasks. 
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Since the second parameter is primarily a function of item type or 
format and related characteristics (i.e. , measurement procedure, which 
can be brought under experimental control), and since test length is 
clearly an objective parameter, the major part of the following discus- 
sion will deal with the concept identification task. 



Concept Identification Task 

The subject* s task in a concept identification problem is described 
generally as involving at least two components: the identification of 

the relevant dimensions, and the identification of the rule or rules 
that bring the attributes together in a particular fashion (Bruner, 

Goodnow and Austin, 1956; Haygood and Bourne, 1965; Bourne, 1968), 

Given a set of dimensions a, b, c . . . x each with n values or at- 
tributes (al, a2 , a3 . . . an; bl, b2, and so forth), the learner’s task 

is to discover or identify the attributes that, satisfy the conditions 
defining the concept. The attributes that satisfy the concept defini- 
tion are said to be relevant and the dimensions to which they belong are 
called relevant dimensions. All other attributes that vary, either 
within or across instances of the concept, are described as irrelevant. 

The method and structure by which relevant and irrelevant attri- 
butes are arranged determines the conceptual rule. Neisser and Weene 
(1962) have shown that when the number of relevant attributes is re- 
stricted to two, there are 10 such conceptual rules, as summarized in 
Table 2. 

Research on attribute learning has demonstrated the effects of such 
variables as the number of relevant and irrelevant dimensions (Walker 
and Bourne, 1961) and the amount of intra- and inter-dimensional variabil- 
ity (Battig and Bourne, 1961), For example, Battig and Bourne* s (l96l) 
investigation on the effects on error rate of changes in the number of 
dimensions revealed that college students made more errors following 
both inter- and int ra-dimensional variations. Further, this relationship 
between error rate and intradimensional variability was found to corre- 
spond closely to a straight line function. 

The amount of irrelevant and relevant information also has been 
shown to contribute to task complexity. Although it would seem on an 
intuitive basis that increased relevant information should increase the 
difficulty of the conceptual task, it is not so obvious that increased 
irrelevant information should do so. Actually, the amount of irrelevant 
information affects only the complexity of the stimulus pattern, since 
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Table 2 



Conceptual Rules Describing Partitions of a Population 
with Two Focal Attributes (Red and Square) 



Af f irmat ion 


All red patterns are examples of the concept. 


Conjunction 


All red and square patterns are examples. 


Inclusive disjunction 


All patterns which are red and square or both 
examples. 


Conditional 


If a pattern is red then it must be square to 
be an example. 


Biconditional 


Red patterns are examples if and only if they 
are square. 


Negation 


All patterns which are not red are examples 
of the concept. 


Alternative denial 


All patterns which are either not red or not 
square are examples. 


Joint denial 


All patterns which are neither red nor square 
are examples. 


Exclusion 


All patterns which are red and not square are 
examples. 


Exclusive disjunction 


All patterns which are red or square but not 
both are examples 



Modified from Haygood and Bourne, 1965. 

the number and type of categories into which the patterns must be sorted 
will remain the same. Further, Walker and Bourne’s (l96l) study indi- 
cated an interaction between the amount of both relevant and irre] vant 
information and problem difficulty. Errors increased at a positively 
accelerated rate with increases in relevant information, but this effect 
depended on the level of irrelevant information employed in a problem. 
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Neisser and Weene (1962) demonstrated that rules are not of equal 
learning difficulty even though they refer to the same set of attributes. 
Neisser and Weene further showed that the different rules fall logically 
into three categories or levels based on the number of component ele- 
ments. Their results indicated that the degree of difficulty of each 
rule increased from level to level. Although the results do not seem 
surprising, it is not immediately clear why the different rules would 
distribute themselves along this continuum of difficulty. 

Several explanations have been offered in an attempt to explain why 
certain rules are more difficult to obtain than others. One possibility 
suggested by Haygood and Bourne (1965) is that subjects are forming and 
testing various rule hypotheses until the correct one is discovered. 

Thus, as concept increases in complexity, more rules become available, 
reducing the probability of an early solution. This explanation is 
similar to, if not the same as, the decision tree model suggested by 
Hunt (1962, cited by Haygood and Bourne, 1965). In a study reported by 
Neisser and Weene (l962) this assumption of availability of rules was 
evaluated. A computer was programmed to identify concepts of varying 
difficulty using a logical elimination strategy. The results indicated 
that the time (number of steps) required for the computer to identify 
each concept was inversely related to the structural simplicity of the 
rule. These results strongly imply that something other than, or in 
addition to, simple logical elimination is involved in human concept 
identification strategy. 

Furthermore , the hypothesis of differential rule difficulty, even 
though plausible, still does not seem to account for all the data. For 
example, Neisser and Weene (1962) reported that subjects seemed to have 
better verbal understanding of complex rules such as M either/or” than of 
the more rapidly learned (i.e., "easier”) conjunctive rules. 

In view of all these arguments, Neisser and Weene (l962) suggest 
that their data can be better explained in terms of a hierarchical or- 
ganization. According to these authors, the facilitative effect of 
learning lower level concepts before learning more complex concepts lies 
in the fact that to solve rule (A. -B) subjects must learn what (A.) and 
(-B) mean; following the same reasoning, learning (A. -B) will facili- 
tate learning of (a. -B v (-A. B). It thus appears important to turn to 
the issue of hierarchical conceptual learning. 



Hierarchical Organization of Concepts 



Ncisser and Weene' s data have a theoretical bearing on a conceptual 
learning model proposed by Gagne, which is fully described in Gagne 
(1965). Specifically, Gagne's model describes learning as increasing in 
stages of complexity and difficulty in hierarchical terms. The dif- 
ferential difficulty of concept learning for ostensibly similar concepts, 
as reported by Neisscr and Weene, corresponds well to his theoretical 
interpretation. 

The basic working principle of Gagne's model is the description of 
learning as a cumulative process. More specifically, he states that 
’’within limitations imposed by growth, behavioral developments result 
from the cumulative effects of learning" (Gagne, 1968, p. 178). 

Since Gagne has been concerned basically with applied research, his 
work deals with instructional procedures for the teaching of mathemati- 
cal concepts (Gagne, 1962a, 1962b, 1965, 1966). In these studies he has 
shown consistently that a complex task can be broken down into its com- 
ponents such that performance in each step of this sequence is dependent 
on mastery of the previous steps (e.g., Gagne, 1962). 

Gagne's model also involves mostly what he calls "rule" or "prin- 
ciple" learning. A rule or principle is basically a concept but is 
distinguished from the latter in that: 

(1) While attainment of a concept can be shown by means of an 
identif icatory response (concrete concept or concept by ob- 
servation), the rule or principle has to be demonstrated (ab- 
stract concept or concept by definition) (Gagne, 1966). 

( 2 ) A rule or principle is composed by associations, motor and 
verbal chains, multiple discriminations, concepts, and simple 
rules (in the case of complex rules) (Gagne, 1965, 1968). 

One of the implications of a rule or principle (as opposed to a 
concept by observation) is that it is not "learned: but has to be taught 
(Gagne, 1966). The distinction here seems to relate to the level of 
abstraction involved in each of these two kinds of concepts. For exam- 
ple, one might expect a subject to learn to identify the radius of a 
circle even though he is not able to define what the radius of a circle 
is. The relevant attributes of the concept are all physically con- 
tained in the instance and can be isolated, for example, simply by dif- 
ferential reinforcement, A rule or principle, however, requires rela- 
tional operations that go far beyond the observable properties of the 
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stimuli (e.g., in the principle of "work"). According to Gagne, even in 
the case where the subject has mastered all the discriminations and con- 
cepts involved in the rule, he is not likely to demonstrate the rule if 
he has not been taught it. Therefore, rule learning as defined by Gagne 
seems to differ considerably from the process usually studied in* psycho- 
logical research, which deals with what he calls concept by observation. 

Moreover, when one thinks of the concepts that constitute a mathe- 
matical rule, it is apparent that the hierarchical organization of in- 
formation becomes an end rather than a means. In the learning of the 
rule 2N-1, to learn what ,, - ,f means is not a facilitatory device but 
rather a prerequisite (unless, of course, the rule is changed). This 
notion of hierarchies comprised by prerequisites is recognized by Gagne. 

The hypothesis is proposed that specific transfer from one 
learning set to another standing above it in the hierarchy 
will be zero if the lower one cannot be recalled and will 
range up to 100% if it can be. (Gagne, 1962, p. 358) 

There is enough evidence, however, that rules can be learned at any 
level independently of learning rules from presumably subordinate levels. 
Haygood and Bourne (1965) and Bourne (l968) have shown consistently that 
if subjects are given training in discovering rules there is an improve- 
ment from problem to problem much like the phenomenon of learning sets 
described by Harlow* (1959). Moreover, Haygood and Bourne’s (1965) study 
also included a condition in which subjects had to learn both a rule and 
the attributes. Although the performance of this group was considerably 
poorer than that of the other two groups (rule learning with attributes 
given and attribute identification with rule given), there is no doubt 
that subjects did learn the task. 

Therefore, although the relationship between Neisser and Wecne's 
results and the work developed by Gagne seem to complement each other, 
more basic research is needed to clarify some of the problems in hier- 
archical organization of concepts. 



Experiment 



The preceding discussion has presented two major theoretical alter- 
natives regarding the learning mechanisms in progressing through hier- 
archically structured learning sequences. The concept complexity model 
proposed by Neisser and Weenc assumes little "level to level 11 transfer, 
but views progress to be more a function of the type and number of rules 
involved at each step in the hierarchy. However, Gagne, posits the 
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cumulative learning model, and argues that progress at any step is pri- 
marily a function of the number and type of prerequisites already 
mastered (presumably at previous stages). 

Since the goals of this research include the development of ex- 
perimental learning sequences that can be considered suitable analogues 
to the learning tasks and sequences currently incorporated in individu- 
alized curricula, the above theoretical dispute becomes important, i.e., 
Xeisscr and Weenc’ s model assumes that step-to-step cumulative component 
continuity (stimulus specific) in a learning hierarchy is not nearly so 
important as is type and number of component rules. Their model seems 
based almost completely on the component analysis of a logical tree 
structure. 

However, Gagne’s model argues that the critical feature in hier- 
archical learning is the cumulative continuity of component stimulus 
attributes, which leads to the postulation of phenotypical tree struc- 
ture in hierarchical learning. 

To preserve the essential components of both models, the materials 
to be used in the training and testing of this experiment were designed 
to provide a crude test of the learning assumptions of each. This de- 
sign consideration involved the generation of two sets of materials 
(described later), one based on Gagne's cumulative model and a parallel 
sot based on Ncisser and Weenc’ s logical model. One advantage of this 
strategy (the use of two materials sets) is that it enables for an 
evaluation of these competing concept learning models. Another advan- 
tage is that it protects against the likelihood that the validation evi- 
dence for the mastery test model will be atypical or biased with regard 
to assumptions of hierarchical learning. 

A second manipulation that is based on theoretical assumptions per- 
tains to the extent of training required to generate variations in 
mastery likelihood. This is an important consideration because the 
mastery model is essentially based on an all-or-none learning assumption, 
and although individual differences will occur, the range of training 
will be critical for purposes of establishing these variations in mas- 
tery. That is, if the range is too low, too few "mastery" cases will 
result, and if the range is too high, extremely few cases will be at 
nonmastery (in addition to the boredom and fatigue factors discussed 
previously ) . 

Two relevant theoretical models used in developing the training 
range were Trabasso and Bower’s Relevant and Redundant Cue model, (1968), 
and Estes’ Stimulus Sampling Model. Interestingly, although these two 
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models are based on somewhat different assumptions, both lead to nearly 
identical estimates of the average number of training trials necessary 
to produce mastery (or attainment) of a concept. Thus, for each concept 
to be included in the training sequence, the estimate of the average 
number of trials to mastery was derived and used to define the moderate 
or middle level training condition. The two other training levels used 
in this experiment were defined relative to this middle level as fol- 
lows: low or le' el 1 was set equal to one-half the number of trials for 
level 2, high or level 3 was set equal to twice the number of trials for 
level 2, This procedure yielded the schedule displayed in Table 3 
(schedule equals number of trials) for the concept hierarchy used in the 
experiment. 



Table 3 

Number of Training Trials by Level 
for Each of the Concepts in the Experiment 



Concept Hierarchy 



Training Level 


Concept 1 


Concept 2 


Concept 3 


Level 1 


10 


10 


20 


Level 2 


20 


20 


40 


Level 3 


40 


40 


80 



These three training levels effectively operationalize the mastery like- 
lihood parameter of the mastery evaluation model. 

The two other factors or parameters described in an earlier section, 
are test length and item error likelihood. For this experiment, a test 
corresponds to blank trials, i.e., training trials arc characterized by 
stimulus presentation, response interval, and feedback or knowledge of 
correct response (KCR). Test trials are characterized as stimulus pre- 
sentation, response interval, but no feedback or KCR. Test or blank 
trials are administered following each training sequence. 

The test length parameter was operationalized in this experiment at 
two levels: five and ten items. These levels or lengths correspond 

roughly to the range of most single skill tests, such as the curriculum 
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imbedded tests of IPI. The item error parameter was operationalized in 
terms of two item forms for the blank trials (i.e., tests). These were # 
two choice items, corresponding theoretically to true-false tests, and 
four choice items, corresponding to four option multiple choice items. 
The assumption of alpha errors being greater for two than four choice 
items constitutes the item error manipulation* 

An additional factor was incorporated in this experiment control 
for the effects of repeated blank trials. This factor described as test 
continuity, essentially splits the sample into two groups: one receiv- 

ing blank trials immediately after each training sequence, and another 
receiving these trials only after the completion of all training. These 
two groups are designated as continuous and terminal testing, respec- 
tively. 
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Chapter II 



METHOD 



Research Design 



The experiment consisted of a three-way repeated measures analysis 
of variance design for the training/transfer manipulation, fully crossed 
with a three-way analysis of variance design on the test manipulations. 
These fully crossed factors are as follows: 

(1) Training/transfer factors 

• Transfer paradigm: cummulative intradimensional attribute 

shifting (Neisser and Weene) versus no-shift (Gagne) 

• Training level: low or level 1 versus moderate or level 2, 

versus high or level 3, where L = 1/2 L , L = 2 X L 

1 2 3 2 

• Conceptual hierarchy: concept 1 (three dimensions, one 

relevant) to concept 2 (four dimensions, two relevant) to 
concept 3 (six dimension, four relevant) 

(2) Test factors 

• Test length (five item versus ten item) 

• Item form (two choice versus four choice) 

• Continuity (continuous versus final) 

These factoriallv balanced design factors are presented schematically 
in Figure la for the training components and in Figure lb for the 
testing components . 
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TNG 




Concept 




Pn radigm 


Level 


1 


2 


3 




1 


n = 16 
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No-shift 


2 


n = 16 


> 







3 


n = 16 


> 


> 





1 


n = 16 
















Shift 


2 


n = 16 


N. 






3 


n = 16 


> 


^ 



N = 96 

Figure la Schematic representation of the research design for the 
training phase of the experiment. 






Test 

Length 



Item Test Continuity 

Type Continuous Final 



5-Item 



10-Item 



2-Choice 


n = 


12 


n 


= 12 


4-Choice 


n = 


12 


n 


= 12 


2-Choice 


n = 


12 


n 


= 12 


1-Choice 


n = 


12 


n 


= 12 



N = 96 

Figure lb Schematic representation of the research design for 
the testing/validation phase of the experiment. This design was 
fully crossed with the training phase design. 
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Subjects 



A total of 96 third grade boys and girls were recruited for this 
experiment - from three elementary schools in Western Massachusetts, 

These schools were in East Whately, Greenfield, and West Springfield 
school districts. Within each school, children were assigned to experi- 
mental conditions at random, with the one exception that sex be uni- 
formly distributed across conditions. 

Materials 



The materials that composed the concepts to be identified in this 
experiment consisted of brightly colored geometric forms of varying 
sizes and shapes and on which discernable patterns or textures had been 
imprinted. These stimuli comprised the conceptual dimensions of shape, 
color, size, and pattern. Two additional dimensions of position and 
number were generated through the use of varying numbers of Xs that 
were situated either above or below the stimulus form. 

For a given trial (training or testing), stimuli were arranged in 
a four choice display, according to a schedule described below, and 
photographed on a 35mm color film. These photographed trials were 
mounted in slide frames and presented via slide projector in the 
training and testing phases of the experiment. 

Each child recorded his choice for a given trial by marking a 
corresponding box or position in his response booklet. For training 
problems, this booklet consisted of one page for each trial. Each page 
contained four empty boxes corresponding to the four stimulus positions 
in the color slide. The child simply marked the box in his answer book- 
let to indicate which of the four stimuli he chose, and then after 
knowledge of the correct response (KCR), turned to the next page for 
the next trial. Test response materials differed in that five, two- or 
four-choice trials were contained on a single page. Examples of 
training and test response materials are presented in Appendix A. 



Conceptual Learning Tasks 

As stated above, the training sequence involved the orderly pro- 
gression of conceptual complexity in hierarchical fashion across the 
three conceptual attainment tasks. This progression involved the 
addition of both relevant and irrelevant stimulus dimensions from task 
to task, and for shift conditions, the changing of "relevant attributes 



£6 



17 



within dimensions. Using the notations of letters representing dimen- 
sions and numbers representing attributes, the task sequences are 
scheinaticized in Table 4 below (where + = relevant dimensions, 

- = irrelevant dimensions): 



Table 4 

Task Sequence for the Concept Attainment Problems 



Paradigm 


Concept 1 


Concept 2 


Concept 3 


No-shift 


+ 


+ + — 


+ + + + 




A , BC 
1 


A ,B ,CD 
1 * 1 


A ,B ,D ,E ,CF 
l’ 1 1* 1* 


Shift 


+ 


+ + — 


+ + ■ + + 




a 2 ,bc 


A ,B ,CD 
3* 2 


A ,B ,D ,E ,CF 
4 3 2’ 2’ 



The problems or trials constituting these learning tasks were gen- 
erated by a computer program, such that both within and across tasks 
all dimensions and attributes were arranged orthogonally under the 
restriction that each trial contain one and only one positive instance. 
Examples of each of these problem sequences (concept by paradigm) and of 
the test items associated with each are presented in Appendix A. These 
sample sequences display the problems as presented to the child and the 
"correct* 1 choice (concept examplar) is designated with a "+ M . 



Apparatus 



The training and test materials were projected via a 35 mm carousel 
slide projector on a screen in full view of the Ss. Also aside from the 
introduction and pretraining that was presented verbally by the E, all 
subsequent training and testing instructions were presented via a 
magnetic tape recorder. The Ss responded individually to each of the 
training and testing problems by marking their choice in a response 
booklet . 
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Procedure 



The experiment was conducted in three phases: a short warm-up 

session, the training phase, and the testing or blank trials phase. 

The goals of the warm-up sessions were as follows: 

(1) Since feedback would be presented via magnetic tape it was 
decided that confirmation of the correct response would be 
givnn in terms of the position occupied by the positive 
stimulus. Consequently, one of the goals of the warm-up 
session was to test or teach the understanding of ordinal 
position, and the ability to match the position of the 
stimuli with the corresponding spaces provided in the 
answer sheets. 

(2) The second objective of the warm-up session was to acquaint 
subjects with the ultimate goal of the problem, namely to 
identify and choose the correct stimulus for each trial, and 
to discover the conceptual rule. 

The training phase consisted of a series of trials with feedback 
appropriate to the concept to be identified. Testing consisted of 
blank or nonfeedback trials. The experiment was conducted such that 
eight children were escorted from their classroom to the experimental 
room and seated. They were instructed to fill out certain information 
on the training booklet in front of them. This information included 
their name, their age, sex, and seat number. 

Children then were given some preliminary instruction concerning 
the nature of the task in which they were to engage. This pretraining 
included a brief instructional unit in which they were taught how to 
make responses for specific choices on the screen and also an introduc- 
tion as to the nature of the specific problems that they would be 
attempting to solve. Specifically, the children were told that they 
would be playing a learning game, as follows: 

The nature of the game will be for you to choose the correct 
picture when I show you several pictures on the screen like 
this (the slide projector was then turned on and four 
stimulus figures appeared on the screen). Here we see four 
pictures. This is the first picture (the E points to the 
leftmost picture), this is the second picture (E points to 
the second picture), this is the third picture, and this is 
the fourth picture (he points to the rightmost picture). 

Now look at the first page of your booklet. Do you see 



ERIC 




ft 



» 



those lour boxes? (E waits for Ss to acknowledge). Each 
one of those boxes goes with a picture you see on the 
screen. The first box would go with the first picture, 
the second box would go with the second picture, the third 
box would go with the third picture, and the fourth box 
would go with the fourth picture. Now, everybody look at 
the pictures again. Do you see the circle? (Pause) 

All right, now suppose that you wanted to choose the 
circle. Mark an X on your answer sheet that shows that 
you are choosing the circle. (Pause) How many people 
chose the third box? Raise your hand if you chose the 
third box. (Pause) All right, let's try another one. 

Turn over to the next page. (E then projects a new slide 
on the screen in which the circle moves to position 2.) 

All right, now let's see if you remember how to play this 
game. Suppose that you wanted to choose the circle again. 

Mark the box that would show that you are choosing the 
circle. How many chose the second box? (Pause) Very 
good. All right, let's try once again. (E advances to 
a new slide.) Turn to the third page. Now mark the box 
for the circle. How many marked the first box? (Pause) 

Very good. 

From now on I'll be talking to you over the tape recorder 
but I want you to keep in mind a few things that are very 
important. First, this is a learning game so you want to 
do your best but you also want to be sure that you do your 
own work. Don't be concerned with what your neighbor is 
doing because he may be doing things wrong. Second, we'll 
have a lot of problems to do and each problem goes on a 
different page. I'll tell you which page it goes on so 
you be sure you look to see that you are on the correct 
page. It is very easy to skip a page and be on the wrong 
one, so look very carefully. Third, once you've made a 
mark for your choice, don't change it. If you have a 
problem, simply raise your hand and we'll help you. All 
right? Very good. I'll be talking to you on the tape 
recorder from now on. Remember, if you have a problem, 
just raise your hand. 

The rest of the experiment was presented automatically via the 
magnetic tape recorder and slide projector. Two Es participated in 
this training, and occasionally a third was added to assist in the 
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training. For the first five or six problems, the second E stood at 
the front of the room and when the correct choice was announced via the 
tape recorder he also indicated the correct choice by pointing a marker 
on the projector screen. The instructions presented on the magnetic 
tape recorder initially introduced the Ss to the specific nature of 
the problems they would be solving as follows: 

All right, boys and girls, we're now ready to begin. Now as 
we explained to you, the purpose of this game is to choose 
the correct picture. Now, when I show you a problem on the 
screen, look carefully at each picture. Then, when I tell 
you, choose one of the pictures by making a mark in your 
booklet. After everybody has had time to choose the picture, 

1*11 tell you which picture was right so you can see if you 
chose the correct one. Now, there's a reason why certain 
pictures are correct and others are not. When you discover 
this reason, you'll be able to get all of the problems 
right. So this means that a first you'll get some of the 
problems wrong. Don't feel bad but try to find the secret 
so you'll get the rest correct. Work quickly but carefully. 

Do your own work and don't change any answers once you've 
made them. I’ll say which page each problem goes with so 
you'll be sure that you're not on the wrong page. All 
right, let's begin. 

Here is the problem for page one. You all should be on the 
first page of your booklet. See each picture carefully. 

Now mark the one you think is correct. (Pause) If you 
marked the third picture, you were correct. The third 
picture. 

This procedure was repeated for each of the training problems. The 
number of problems presented was determined by the learning condition 
and the concept level of the particular training, sequence. 

The instructions given to the children receiving continuous testing 
(in this case, the first concept tested) are as follows: 

All right, let's continue with the game only we're going to 
play it a little differently than before. Each of you has n 
sheet of paper on which you have written your name. Now 
I'll show you some problems just like before and for each 
problem you are to choose the picture that you think is 
correct. However, I'm not going to tell you which one is 
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correct for these problems. All right, now 1*11 tell you 
which line you should be on for each of these problems. 

Initial problems were presented at a fairly stable rate of 15 
seconds observation time and 10 seconds response interval. For later 
trials this rate was advanced to roughly 10 seconds observation time 
and 5 seconds response and feedback time (such that four problems per 
minute were presented for the later slides in the sequence). The test 
items were presented at a fairly stable rate of 15 seconds per item; 
there was no feedback interval. 

The eight Ss who served simultaneously at each session of the 
experiment actually constituted four subgroups of two Ss each. One 
subgroup of Ss remained throughout all activities for a given training 
condition, i.e., they received all training and all test items. The 
second group received only the first five of each ten item test. They 
were excused from the room and waited outside after they completed the 
first five items for each of the three tests. The third and fourth 
groups were excused from the experiment immediately following training 
for the first and second concepts. They were reintroduced after the 
tests were completed. 

All children received the first five items of the terminal test. 
However, only the first and third groups of children received the last 
five items of the terminal test. This procedure did not produce any 
noticeable negative side effects, particularly with the children who 
remained throughout the experiment (i.e., received all training and 
testing). Moreover, the children who did not receive continuous testing 
(i.e., were excused from the experiment during the first and second 
tests) appeared concerned that they were not able to participate in 
everything. 



Data Processing 



Data in the form of item choices in the training and test booklets 
were coded and transferred to punched card forms for computer pro- 
cessing. This processing included the evaluation of literal response 
protocols for the operation of various solution strategies and possible 
stimulus biases on the part of the Ss. One such bias did appear and 
was associated with the pretraining and concept 1 problem received by 
the shift Ss. Specifically, the circle shape used for pretraining 
corresponded to the trial 1 correct figure for the concept 1 problems 
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under shift training. Subsequent trials tended to correct for this 
false solution (alpha error) by teaching the Ss that the pattern rather 
than shape was relevant. However, the initial bias did tend to lead to 
many early solutions with these Ss, 
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Chapter III 



RESULTS AND DISCUSSION 



The presentation and discussion ot results in this chapter is or 
ganized into three major sections. These sections are analysis of 
training data, analysis of test data, and validation evidence for the 

tes t model . 



Analysis of Training Data 

Response protocols obtained from the training booklets were or- 
canized^ according to the experimental treatment factors for purposes 
of^sses 8^ ng training effects. The two purposes served by these anal- 
yses were: (I) the evaluation of differential learning and trails ei 

effects as predicted by the two conceptual training paradigm (shift 
vereul no^Uft) and the establishment of the functional relationships 
between these training paradigms and other design factors, an 
determination of the extent to which differences in performance cor- 
responded to the amount of training provided, This -eojd purpose was 
nrinpina llv relevant to validating the assumption that differences 
probability of mastery existed and corresponded roughly to the ex P en- 
mental training variables. 

The dependent variable selected for this analysis was the 
me aepen training trials for each of the 

of correct responses out of the last 1 contributed three scores (one 
three experimental concepts. Thus each S contnuut 

.\ t-n Hi it; analysis. Furthermore, to control for the 
for each concept) to this analysis. ... nifferences 

effects of individual differences resulting from ability dif 

effects of inaiyiu conce pt 1 for Shift groups , Ss were 

or nremature solutions (i.e., the concept x . in 

blocked into High or <' factor to the 

trials performance on concept 1. mis proviueu 

design in the form of a two level covariable block. 

The results of a repeated measures analysis of variance performed 
on these treining data are summarized in Table 5. The des gn factors 
in this analysis were transfer 

block 1 ” hig^versu^low) * across the three concepts trained. Performance 
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Table 5 



Analysis of Variance on Performance (Number 
Correct of Last 10 Trials) Across 
Training Factors and Conditions 



0 

ERJC 



Source 


df 


MS 


F 


Paradigm (P) 


1 


21 .67 


1.94 


Trng level (l) 


2 


99.21 


8.90** 


Ability block (b) 


1 


166.53 


14.94*** 


P XL 


2 


2.33 


0.21 


P XB 


1 


12.92 


1.16 


L XB 


2 


44.88 


4.03* 


P XL XB 


2 


17.32 


1.55 


Error (btwn) 


84 


11 .14 




Concept (c) 


2 


47 . 96 


11.96*** 


C XP 


2 


22.54 


5.62** 


C XL 


4 


7.90 


1.97 


C XB 


2 


12.41 


3.09* 


C XP XL 


4 


2.96 


0.74 


C XP XB 


2 


0.67 


0.17 


C XL XB 


4 


8.82 


2.20* 


C XP XL XB 


4 


14.75 


3.68** 


Error (within) 


168 


4.01 





* p < 0.05 

** p < 0.01 

*** P < 0.001 . 

means comprising this analysis are presented in Table 6, both in terms 
of the experimental design and in summary form in terms of design 
variables . 

Significant effects resulting from this analysis are as follows: 

(l) Ss in level 2 or level 3 training groups were performing 
substantially better than Ss in level 1 training. This 
effect, displayed in Figure 2 provides support for the 
principal manipulation of the experiment — namely, that the 




26 



Table 6 



Performance Means (Percent Correct) for Last 
10 Trials for Each Concept 
In Experimental Training 



Design 


First Trials 




Concept 




Factor 


( abil ity) 


1 


2 


3 


No-Shif t 










L i 


L0 


32.5 


48.8 


40 .0 


HI 


66.2 


65.0 


37.5 


L 


L0 


51.2 


60.0 


46.2 




HI 


80.0 


83.8 


88.8 




L0 


66.2 


52.5 


71.2 


O 


HI 


78.8 


75.0 


68.8 


Shift 










L 


L0 


50.0 


50.0 


32.5 




HI 


92.5 


70.0 


48.8 


L 


L0 


56.2 


67.5 


67.5 




HI 


98.8 


85.0 


62.5 


L 


L0 


96.2 


78.8 


57.5 




HI 


71.2 


68.8 


57.5 




Summary 


of Means 






Factor 


Mean 


Factor 


Mean 


Shift 


61 .8 


L0 1st 


trials 


56.9 


No-Shift 


67.3 


HI 1st 


trials 


72.2 


L i 


52.8 


Concept 


1 


70.0 


L 2 


70.6 


Concept 


2 


67.1 


L 

3 


70.2 


Concept 


3 


56.6 
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probability of concept mastery was different as a function of 
training condition. However, as was evident in Figure 2 , this 
effect is concentrated in level 1 versus "other” companions, 
since the difference between level 2 and level 3 is negl igible. 

(2) Initial performance differences persist throughout training 

as evidenced in Figure 3, This effect suggests that individual 
differences, perhaps in concept identification strategies, are 
stable across training. 

( 3 ) Overall performance systematically declined across the three 
experimental concepts. This outcome, plotted in Figure 4, 

is consistent with the assumption that the training concepts 
are ordered in terms of difficulty. Furthermore, this gradual 
decline in performance evidenced in Figure 4 would necessarily 
occur if, as hypothesized prerequisites in terms of previous 
concepts were either not sufficiently mastered (L^ training) 
or overlearned (L 3 training), thus producing negative or "off- 
sett ing" transfer. 

(4) The above curvilinear training effects interpretation receives 
further support from inspection of means presented in Fig- 
ure 5 for the training level by ability grouping interaction. 
In particular, the low ability groups show a linear perfor- 
mance trend in terns of training level, whereas the trend is 
curvilinear for the high ability JSs. Since the high groups 
arrive at a solution earlier within each training sequence, 
they effectively receive more reinforced practice on the cri- 
terial attributes than do the low ability groups. In some 
instances this would be expected to result in overlearning or 
a form of functional fixedness, which would interfere with or 
produce negative transfer for the learning of subsequent con- 
cepts . 

On the other hand, very few of the low ability JSs appear to 
reach early solutions or attainments, and therefore would be 
expected to benefit from extended training. This does appear 
to be the case. 

( 5 ) Further support for this differential transfer interpreta- 
tion is provided in examining the interactions of: (l) para- 
digm by concept, (2) ability blocks by concept, ( 3 ) training 
levels by concept by ability block, and (4) paradigm by abil- 
ity block by training level by concept. Each of these inter- 
action effects is significant and each displayed a pattern 
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FIGURE 2 



FIGURE 3 




TRAINING LEVEL 

MEAN PERFORMANCE (Percent Correct) ON LAST 10 TRAINING TRIALS AS A 
FUNCTION OF LEVEL OF TRAINING. F (2 84) = 8.90, P < 0.01 




100 




INITIAL TRIALS (Ability) GROUP 



MEAN PERFORMANCE ON LAST 10 TRAINING TRIALS AS A FUNCTION OF 
INITIAL TRIALS PERFORMANCE (Ability Block). F (1 M) = 14.94, P < 0.001 

K-7 




FIGURE 4 MEAN PERFORMANCE ON LAST 10 TRIALS ACROSS THE THREE 
SEQUENTIAL CONCEPTS. F (2 168) = 11.96, P < 0.001 
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FIGURE 5 PLOT OF DIFFERENTIAL PERFORMANCE IN TRAINING IN TERMS OF THE 
ABILITY GROUP BY TRAINING LEVEL. F (2 = 4.03, P < 0.05 
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consistent with the above outlined interpretation. These in- 
teractions are presented graphically in Figures 6, 7, 8, 9. 
Specifically, the overall decrement in performance was greater 
for concepts involving intradimensional shifts than for the 
no-shift paradigm (Figure 6). Furthermore, Figure 6 indicates 
essentially no decrement in the no-shift condition. Figure 7 
demonstrates a similar pattern with regard to initial perfor- 
mance groups. The high groups show the decrement across prob- 
lems, whereas the low groups remain relatively stable. Both 
groups, however, are significantly above chance across all the 
problems. 

Figure 8 displays this differential performance--or transfer 
pattern-- in terms of training level for each ability block. 
This effect appears concentrated in the shift from concept 1 
to concept 2 in that for the low initial groups, level 3 
training groups experienced a decrement whereas level 1 and 
level 2 groups tended to improve. However, the high initial 
groups all show declines across the three concept problems, 
again suggesting possible negative transfer (or--at least--a 
return to baseline performance). 

Finally, Figure 9 shows these transfer effects to be sub- 
stantially different for the two paradigms, i.e., perfor- 
mance tended to decline systematically for high block shift 
groups, regardless of training level, whereas moderate (L 9 ) 
training appeared facilitating for the low block groups. 
Effects appear negligible, if not slightly positive for the 
no- shift training method from concept 1 to concept 2 (except 
for the level 3 training) and appear erratic for concept 2 to 
concept 3 transfer. 



It should be noted that the interpretations of transfer as applied 
to the present evidence is somewhat unorthodox. The general transfer 
paradigm of: 



vs. 



Train Test 

Group 1 A ■■■■» B 

Group 2 rest B 



is not represented in this analysis, since the principal focus of this 
research was not that of experimenting on transfer. However, prior 
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1 2 3 

CONCEPTS 



FIGURE 6 PLOT OF THE DIFFERENTIAL EFFECTS OF THE TWO TRANSFER PARADIGMS 
(Shift versus no-shift) ON TRAINING PERFORMANCE ACROSS THE 
CONCEPTUAL HIERARCHY. F (2 168) = 5.62, P < 0.01 




FIGURE 7 PLOT OF THE DIFFERENT TRAINING CURVES FOR EACH OF THE TWO 

ABILITY GROUPS ACROSS THE CONCEPTUAL HIERARCHY. F (2 168) = 309, 
P < 0.05 
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NO-SHIFT PARADIGM 
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FIGURE 9 PLOTS OF THE DIFFERENTIAL EFFECTS OF TRAINING LEVEL ON 

PERFORMANCE ACROSS THE CONCEPT HIERARCHY SEPARATELY FOR 
INITIAL TRIALS GROUPINGS WITHIN EACH TRAINING PARADIGM. 

f ,4. 168) ' 36S - P < 00 ' 
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To so." ft exter. t. ehe training data can srar.c alone ir. providing 
ov: dor.ee for the validity of the assured manipulations . The r.o-shi: t 
concepts v*e re responded to in a consistent tasr.ior., ar.d, coixect*.*^ -°* 
initial solution bias of the shift t ra i n i r.g , with te*er en'ors . i 
levels manipulation evidenced its effects throughout the data, and par- 
ticularly for the concept 3 solution, which was clearly the most dii- 
Moult to attain. However, at least three sources ox experimental 
error were present in these training data. These sources are; \1. the 
possibility of Ss skipping pages accidentally, x 2) of waiting until 
feedback- - or KCR- - before entering their response, and (3) of changing 
an entry following KCR. Thus, it is necessary to analyze test data to 
establish and corroborate these experimental effects. 



Analysis of Test Data 

Two separate analyses of variance on test data (blank trials) were 
performed; the first analysis was formally identical to the analysis 
performed on the training data, with the exception of the omission of 
the blocking factor. This analysis was performed on all those Ss 
receiving blank trials after each training series (i.e., one-half the 
sample) . The second analysis was performed on the total sample but for 
blank trial performance on concept 3 problems only. The results of 
these two analyses are presented and discussed separately. 

For each analysis, the dependent variable was the percent coriect 
on the blank trials. Also, since the testing factors (item form and 
test length) are analyzed separately in the model validation section, 
they wore not included in the analyses of variance. The results of the 
analysis of tost performance across concepts are summarized in fable 7, 
and the coll means are presented and summarized in Table 8. These 
results are described and interpreted as follows: 

(l) Moan performance varied significantly across the three con- 
cept tests. Overall performance was best (most correct^ on 
concept 2 and poorest on concept 3 blank trials. This per- 
formance differential was relatively uniform, with about 
8 percentage points separating one average from the next. 
However, the trend does not parallel that observed for training 
data, in that concept 1 and concept 2 performance means are 
reversed . 
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Table 7 



Analysis of Variance of Performance 
(Number Correct) During Blank 
Trials Across Concepts 1, 2, and 3 



Source 


df 


MS 


F 


Paradigm (p) 


1 


57.51 


2.83 


Level (l) 


2 


57.33 


2.82 


P XL 


2 


35.19 


1 .73 


Error (btwn) 


42 


20 .33 




Concept (c) 


2 


25.65 


4.68** 


P xc 


2 


50.13 


8.81** 


LXC 


4 


9 .38 


1.65 


P XL XC 


4 


7.54 


1.32 


Error (within) 


84 


5 • 69 





** p < 0.01* 
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Performance Means (Percent Correct) for Blank 
Trials (Tests) for Each Concept 



Des ign 






Concept 




Factor 




1 


2 


3 


No-shift _ _ 




h. 


36.2 


46.2 


31.2 




L 2 


62.5 


66.2 


71.2 




L 3 


33.8 


75.0 


73.7 


Shift 




h. 


65.0 


73.8 


46.2 



L 2 


75.0 


61.2 


42.5 


CO 


90.0 


93.8 


62.5 



Summary 


Factor 


Mean 


No-shift 


55.1 


Shift 


67.8 


h 


49.8 


L 2 


63.1 


L 3 


71.4 


Cl 


60.4 


C 2 


69.4 


c 3 


54.6 
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(2) The 5 *bove outcome becomes more complex when the interaction 
of paradigm by concept is evaluated. Specifically, as demon- 
strated in Figure 10, mean test performance is seen to 
increase from concept 1 to concept 2 for the no- shift groups, 
whereas it displays a slight decline from concept 1 to con- 
cept 2 for shift JSs . The concept 2 to concept 3 differential 
is disordinal (crossover) in that by concept 3, no-shift Ss 
are averaging more correct responses than are shift Ss. Thus, 
based on overall trends, the no-shift JSs tend to show improve- 
ment across tests, whereas the shift Ss tend to decline. This 
outcome is consistent with the training data and with the 
transfer interpretation proposed earlier. 

The results of an analysis of the total sample for concept 3 blank 
trial performance (recall that only one-half of the JSs received blank 
trials across concepts 1, 2, and 3) are presented in Table 9, and the 
corresponding cell means are presented and summarized in Table 10. 

These results show the single significant performance difference on 
concept 3 blank trials to be that corresponding to training levels. 

In particular, level 1 Sis averaged 38.8 percent correct, level 2 aver- 
aged 46.2 percent, and level 3 averaged 62.6 percent. These perfor- 
mance averages correspond well to the training manipulations, and 
likely represent a less biased (methodologically) estimate of the ex- 
perimental effects. 




CONCEPTS 

FIGURE 10 MEAN BLANK TRIAL PERFORMANCE (Percent Correct) ACROSS CONCEPTS 
FOR EACH TRANSFER PARADIGM. F (2 M) *= 8.81, P < 0.01 



Table 9 



Analysis of Variance on Performance 
(Number Correct) For Total Sample On 



Concept 


3 Blank 


Trials 




Source 


df 


MS 


F 


Paradigm (P) 


1 


10.01 


0 . 


Level (L) 


2 


43.26 


4. 22* 


Group (g) 


1 


31 .51 


3.07 


P XL 


2 


19.01 


1.85 


P XG 


1 


0*84 


0.08 


L XG 


2 


9.20 


0.90 


P XL XG 


2 


10*41 


1.02 


Error 


84 


10*26 





* p < 0.05 




39 



47 



Table 10 



Performance Means (Percent Correct) on 
Concept 3 Test: Total Sample 









Tra ining 


Level 






L i 




L 3 


No- shift 


CONT 


31.2 


71.2 


73.8 




FNL 


35.0 


35.0 


66.2 


Shift 


CONT 


46.2 


42.5 


62.5 




FNL 


42.5 


36.2 


43.8 




Summary 


of Means 




Factor 


Mean 




Factor 


Mean 


No- shift 


52.1 




CONT 


54.6 


Shift 


45.6 




FNL 


43.1 


L 

1 


38.8 






60.4 


L 2 


46.2 




C 2 


69.4 


L 3 


62.6 




C 3 


52.5 



Other outcome trends evidenced in Table 10 are a tendency for no- 
shift groups to perform better than shift groups, and a tendency for 
groups receiving blank trials throughout the experiment (continuous 
testing) to perform better than those introduced only at concept 3. 

The first of these trends appears consistent with the transfer inter- 
pretation proposed in the previous section (training effects), and the 
second trend suggests a possible familiarization (with blank trials 
procedures) effect. The important point is that both of these trends, 
and the preceding significant effects, are consistent with the results 
of the training analyses and thus provide evidence for the validity of 
the experimental manipulations. To that extent, the data do appear 
appropriate for the evaluation of the mastery test model, which is the 
principal focus of this experiment. 
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Validation Evidence for the Test Model 



The previous two analyses sections evaluated lor training and test 
effects in terms of design variables, such as level of training, para- 
digm, and initial performance across concepts. The analyses reported in 
this section will deal with the establishment of test parameters (where 
a test consists of blank trials) and of the operating characteristics 
of the mastery test model. The design factors for the validation 
analyses are paradigm (shift versus no-shift) by test length (five 
versus ten items) by item form (two versus four choice), across the 
three conceptual problems. 

Three steps were followed in developing evidence for the validity 
of the mastery model: (l) performing item analysis and subsequently 

estimating single item reliability for data obtained under each of the 
test grouping conditions for each conceptual test-~this amounted to 
performing 24 separate item analyses (two test lengths by two item 
forms by two paradigms by three concepts equals 24); ( 2 ) generating 

optimal cut rules (pass/fail) for each of these "tests" through im- 
plementation of the mastery model using parameters generated in (l), 
above; and ( 3 ) estimating the concurrence of mastery /nonmastery deci- 
sions based on test data to training data, and evaluating these corres- 
pondences in terras of design factors (level of training and so forth) 
and of goodness of fit for overall test distributions. 



Items Analyses 

The results of each of the 24 item analyses are presented in Ap- 
pendix B. For each analysis, subject by item responses are presented, 
as are total scores and item difficulties (percent passing the item). 
Test statistics presented for each such analysis are the mean, standard 
deviation, reliability (as estimated by Kuder Richardson formula 20), 
and the average item reliability (as estimated by the Spearman Brown 
formula). These means, standard deviations, and reliabilities are 
summarized in Table 11. 

Inspection of this table fails to reveal any clearly systematic 
patterns across concepts, particularly in terms of overall test reli- 
ability. The obtained values, however, are generally high and accept- 
able, particularly for such "short" tests. Also the ten item tests do 
appear to yield, on the average, higher reliability values than do the 
five item versions. But the expected result of increase in reliability 
with' increase in test length does not consistently occur. The two ex- 
ceptions both occur with two choice tests and thus may be due to chance 
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Table 11 
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Summary of Means, Standard Deviations, and 
Reliabilities from Test Item Analyses 



Design 

Factor 


Concept 1 
X rSD 


Concept 2 

X SD 

R 


Concept 3 

X SD 
R 


No-shift 








5 I tem- 2CH 


3.50 1.643 

.851 


4.17 1.169 

.716 


3.69 1.155 

.391 


5 Item-4CH 


1 . 00 1 • 265 

.686 


2.67 1.366 

.543 


1.42 1.881 

.899 


10 I tem-2CH 


5.50 3.987 

.950 


6.00 2.098 

.564 


6.58 2.275 

.671 


10 I tein-4CH 


3.17 3.125 

.899 


5.33 3.266 

.880 


4.08 4.562 

.984 


Shift 








5 Item-2CH 


4.67 0.516 


4.00 1.095 

.440 


2.08 1.564 

.651 


5 Item-4CH 


3.00 2.280 

.970 


4.00 1.265 

.642 


2.67 1 . 557 

.670 


10 Item- 2CH 


8.00 3.098 

.938 


7.00 4.147 

.981 


5.92 1.782 

.326 


10 I tem-4CH 


7.33 4.131 

.988 


7.50 3.507 

.958 


2.83 2.725 

.820 


effects associated with tests of 
analogue). In all, the results 


this type (i.e. , the 
of this analysis were 


true-false test 
considered accept- 



able for purposes of generating mastery cut rules. 



42 

50 



* 



Mastery Cut Rules 



Three parameters are necessary to compute the optimal mastery cut 
rule using the formula and model described earlier in this report. 

These parameters are the length of the test, the item error probability 
and corresponding class (false pass versus false fail likelihood), and 
the prior probability of mastery (also described as the relative deci- 
sion error weights). The test length parameter thus corresponds to 
five versus ten item tests. Item errors are estimated from total test 
reliability and are distributed in terms of item form; prior probabil- 
ity of mastery is seen to correspond to training level ( , L 2 , or L 3 ), 

Using the estimates of average item reliabilities provided by the 
item analyses, a matrix of cut rules was computed for each of the 24 
test item groups. These matrixes, presented below each corresponding 
item analysis in Appendix B, provide percent correct cutoffs and number 
correct values required for a mastery decision for each of several 
prior mastery likelihoods by each of several relative item error 
weights. The prior mastery likelihoods are 1:100, 10:1, equal, 1:10, 
and 1:100. Alpha (false pass) to beta (false fail) item weights-- for 
a given item error likelihood, as estimated from the preceding item 
analysis--are varied as follows: 10:1, 5:1, 3:1, 2:1, 1:1, 1:2, 1:3, 

1:5, 1:10. Corresponding cut rules then are listed for these 45 combi- 
nations (i.e., the five prior probabilities or ERR WT by the nine rela- 
tive item error weights or alpha to beta ratio). 

The cut rules were applied to the test data as follows: training 

level was considered equivalent to prior probability of mastery (ERR WT) 
such that hi = 0.01, L 2 = 1.0, and L 3 = 10.0. Item form was considered 
equivalent to relative item error (alpha to beta ratio) such that two 
choice = 0.330 and four choice = 0.500. For each test, each score was 
evaluated in terms of training factors (Lj, L 2 , L 3 ), and item form 
(two or four choice), and those scores that did not equal or exceed the 
derived mastery value were interpreted as reflecting nonmastery. This 
procedure was followed for each of the 24 tests as presented in Appen- 
dix B. The specific range of cut rules that were applied to the cor- 
responding test scores are enclosed by the box within each matrix. 

To demonstrate the operating characteristics of this evaluation 
model, overall test data were aggregated into two distributions, one 
for each test length ^five versus ten items). Each of these distribu- 
tions is comprised of both mastery and nonmastery scores. Application 
of cut rules as generated by the model on a score by score basis oper- 
ates to differentiate the two component distributions from the overall 
distribution. This differentiation is shown in Figure 11 for the five 
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TEST SCORE 



FIGURE 11 DISTRIBUTION OF SCORES ON FIVE ITEM TESTS FOR THE TOTAL SAMPLE 
(M+M Groups) AND FOR THE INDIVIDUAL M AND M GROUPS AS 
EVALUATED BY THE MASTERY MODEL 



item tests, and in Figure 12 for the ten item tests. In both of these 
figures, it is clear that the decisions based on the model — applied 
case by co.se--are quire different than those that would result from 
the application of a single across-the-board cut rule. Furthermore, 
in the case of the ten item distribution, substantial overlap is evi- 
dent for mastery and nonmastery distributions. 

To establish the concurrent validity of these evaluations, test 
cut rules were applied to corresponding training data (last five trials 
for five item tests, last ten trials for ten item tests) and evaluated 
for concurrence of "fit" using chi square procedures. These analyses 
were performed across all design factors, and separating for item type, 
conceptual test, training level, and for level by item type. The 
results of these analyses are summarized in Table 12 and are inter- 
preted as supporting both the conclusions drawn from training and test 
analyses, and those regarding the validity and utility of the evalua- 
t ion model . 
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TEST SCORE 



FIGURE 12 DISTRIBUTION OF SCORES ON 10 ITEM TESTS FOR THE TOTAL SAMPLE 
AND FOR INDIVIDUAL M AND M GROUPS AS EVALUATED BY THE 
MASTERY MODEL 



Specifically, the overall training- test ing contingency (X 2 = 31. 5l) 
establishes the upper limit for evidence of validity that can be based 
on concurrence of training- testing mastery decisions. Since the cut 
rule formula is optimal in terms of the reliability of the test in- 
volved, one estimate of the concurrent validity of the model is the 
.ok tent to which the training- test contingency corresponds to the mean 
test item reliability. Expressing this estimate as a ratio of train- 
test contingency to mean test item reliability, the apparent concurrent 
validity of the test model is 0.375/0.395 = 0.949. In other words, 
given the unreliability of the tests involved and the assumption of 
training testing correspondence, the evaluation model appears 95 per- 
cent effective in optimizing test information. 

Other evidence for the validity of the model is drawn from similar 
correspondence of train- test contingencies as a function design factor. 
For example, the model appears essentially equally effective in dif- 
ferentiation mastery states using two choice (X 2 = 18.52) and four 
choice (X 2 = 20.15) tests. This is particularly impressive given that 
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Table 12 



CHI Square Values for Concurrence of Mastery Evaluations 
of Training and Test Status, Based on Test Data as Criteria 



Level of Analysis 


N 


2 

X 


P 


Overall 


192 


31 .51 


<0.001 


2-Choice tests 


96 


18.51 


<0.001 


4-Choice tests 


96 


20.15 


<0.001 


Concept 1 tests 


48 


19.85 


<0.001 


Concept 2 tests 


48 


19.93 


<0.001 


Concept 3 tests 


96 


11 .09 


<0.001 


Level 1 training 


64 


5.99 


<0.02 


Level 2 training 


64 


7.42 


<0.01 


Level 3 training 


64 


11 .00 


<0.001 


2-Choice, L 1 TRNG 


32 


3.56 


0.06 


L 2 TRNG 


32 


2.38 


0.15 


L 3 TRNG 


32 


3.41 


0.07 


4-Choice 1^ TRNG 


32 


1 .74 


0.20 


l 2 trng 


32 


4.57 


<0.05 


l 3 TRNG 


32 


9.50 


<0.01 
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conventionally these tests produce quite different results. Thus, the 
model appears capable of dealing with the item-error parameter ( or/g) 
quite effectively. Also, the training test correspondence patterns 
nicely in terms of training level. That is, a systematic transition 
of cases from the fail-fail to pass-pass contingencies occurs from 
to L.^ data. 

Finally, goodness of fit tests were applied to the mastery and 
nonmastery distributions separately for five and ten item tests. The 
theoretical distributions against which each of these empirical curves 
were compared are probability distributions of the form: 

P( n |s) = C p n X q t n 

n, t- n 

where 

n = test score (number correct) 

C as binonial coefficient 
^ s — mastery state 
t ss test length 

p = item error probability for state S 
q = 1 - P 

The results of these tests were favorable for the five item dis- 
tributions, showing reasonably good theoretical and empirical corres- 
pondence for mastery ( X 2 = 12.8, p = 0.05), and very good fit for non- 
mastery (X 2 = 5.99, 0*6 > p >0.5) distributions. The ten item dis- 
tributions, however, showed rather poor fit to the expected curves 
mastery X 2 = 46.25, p <0*01 and nonmastery X 2 = 74.83, p <0.01* 

One likely reason is that these ten item distributions are multinomial, 
or at least composits of two or more binomial distributions. As such, 
they might comprise the basis of subsequent tests of fit in replicate 
analyses . 



Chapter 4 



CONCLUSIONS 



The results of the analysis of data obtained from this experiment 
provide several types of evidence that bear directly on the validity 
and utility of the proposed mastery evaluation model. The first of 
these classes of evidence pertains to the assumptions of a cumulative 
and hierarchical learning model — similar to that proposed by Gagne and 
incorporated in individualized instructional systems such as IPI . Sup- 
port for these assumptions is provided from the analysis results on 
training data, in which a curvilinear transfer effect appears to occur 
for shift and not for the no-shift. The effect becomes more dramatic 
when controlling for premature solutions such that overall, the data 
strongly favor the cumulative hierarchical model. 

Analysis of training and test performance also supplies strong 
corroborative support for the assumption that variations in training 
experience around the expected requirements — as derived from the stimu- 
lus sampling model (Estes) or the relevant and redundant cue model 
(Trabasso and Bower) — effectively produce systematic and operational 
differences in likelihood of concept attainment (mastery). This outcome 
is important, since it established an empirical basis for the subse- 
quent validation of the test mastery evaluation model. 

The evidence derived in support of this model, although not 
striking or dramatic, is nonetheless favorable. It was shown that the 
evaluation model was optimal to the extent that it was 95 percent 
effective in matching test performance with mastery state, given the 
constraints implied by the training-testing contingency. It is also 
concluded that the theoretical and actual test distributions show 
reasonably good concurrence for the short (five item) tests, but that 
the fit is not so good for the longer tests. This outcome is favorable 
at least in part, since the model is designed primarily for use with 
extremely brief tests. 

Therefore, it is concluded that the evidence obtained from this 
research is supportive of the assumptions of the mastery evaluation 
model with respect to single skill mastery/nonmastery decisions. To 
further establish the validity of this model, research should be 
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undertaken in which content valid curricula constitute the material 
being taught and in which single skill criterion referenced tests, 
similar to IPI "curriculum imbedded tests," constitute the measuring 
instruments. Such a study would incorporate branching and tracking, 
or path analysis of children subsequent to sequential mastery/non- 
mastery decisions. A study of this nature could be used both to 
further establish the apparent validity of the model, and to more 
closely appraise its characteristics on a cost-benefit basis. 
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EXHIBIT 1 SAMPLE SEQUENCE OF TRAINING TRIALS FOR CONCEPT 1. THESE 
STIMULI WERE USED BOTH FOR SHIFT AND NO-SHIFT GROUPS. 

(S+ = Shift Key, NS+ = No-Shift Key) 
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EXHIBIT 2 SAMPLE TRAINING ITEMS FOR NO-SHIFT, CONCEPT 2 
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EXHIBIT 3 



SAMPLE TRAINING ITEMS FOR SHIFT, CONCEPT 2 
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EXHIBIT 4 SAMPLE TRAINING ITEMS FDR NO-SHIFT, CONCEPT 3 
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EXHIBIT 5 SAMPLE TRAINING ITEMS FOR SHIFT. CONCEPT 3 




63 



67 




xxxx 



YELLOW 



XXXX 



:■ ■■ .1 v t>i M 



GREEN 




RED 



RED 




EXHIBIT 6 SAMPLE TEST ITEMS (Two and Four Choice) 

64 



ft 



w 






7 



NO. 

NAME 

SCHOOI 

GRADE CLASS 

AGE BOY GIRL 

SAMPLE COVER SHEET FOR CONCEPT TRAINING RESPONSE BOOKLET 



SAMPLE RESPONSE FORM FOR A SINGLE TRIAL IN CONCEPT TRAINING 



EXHIBIT 7 SAMPLE COVER SHEET AND RESPONSE FORM FOR TRAINING 
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EXHIBIT 8 SAMPLE TEST RESPONSE FORM FOR TWO CHOICE TEST 
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Appendix B 



COMPUTER PROGRAM AND RESULTS OF ITEM ANALYSES 
AND MASTERY DECISION RULES 




•2 



V 



! 



7 



Appendix B 

COMPUTER PROGRAM AND RESULTS OF ITEM ANALYSES 
AND MASTERY DECISION RULES 



The contents of this appendix present the computer program used to 
compute the item analyses and the corresponding mastery decision rules 
for each of the 2d test groupings included in the experiment. These 
groupings were by test length (five or ten item) by item for (two or 
four choice) by concept (1, 2, or 3) for each of the two training 
paradigms . 

The output for each of these test group item analyses includes 
item response protocol, item difficulty (percent pass), test mean, 
standard deviation, KR 20 reliability estimate, and estimated single 
i tem reliability. 

Immediately following the item analysis output is listed a matrix 
of decision rules for the analyzed test, arranged in terms of mastery 
likelihood (ERR WT) and item type (ALPHA TO BETA RATIO). The encased 
values — a percent value and a corresponding "i terns to pass' 1 value — 
correspond to the rules applied to the above test scores. 

It should be noted that the cases and test scores are listed in 
pairs within training levels, such that the first two cases were 
trained under level 1, the next pair under level 2, and the third pair 
under level 3. This pattern is repeated for test groups based on 12 
cases. Also, when the test reliability estimate approached values of 
zero or unity test cut rules were not calculated, and an M/M split of 
50 percent was assumed. 
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PROGRAM SCORE (INPUT, OUTPUT, TAPE5*lNPJT,TAPE6aOUTPUT) 
DIMENSION PR03<10> ,X (30,30) .S(30) ,P(30) ,N(10) «FMT(12) , 
110(30) *KX (30 * 30) ,KP ( 30 ) 

DIMENSION WAG (9) ,WER (9) , ALPHA (9) «BETA(9) ,CJT (9,5) «NCJT (9,5) 
C 

C HEAD IN ALPHA/BETA AND ERROR WEIGHTS 
C 

READ (5,1 j»5) WAG 
READ (5, 1 6 5 ) WEN 
105 FORMAT (10F4.0) 

C 

C HEAO IN PROBLEM INFORMATION 
C 

1 HEAD (5, loO) PR08,NC,NI 
IF(PR0B(1) .EQ.6HFINISH) GO TO BO 
WRITE (6,500 ) PROR.NC.NI 

READ (5, 1 nl ) FMT 
C 

C REAO IN DATA VALUES FOR THIS PROBLEM, . INITIALIZE PARAMETERS 
C 

READ (5»FmT) (I0d), (X(I,J),J»1»NI),I»1*NC) 

XNbNC 

SUM»0. 

SSU"0, 

2 DO 10 Jb1,NC 
S(J) =0, 

C 

C CALCULATE TEST SCORES, MEAN ANO VARIANCE 
C 

00 11 IbI.NI 
KX ( J , I ) a XU, I) 

11 S(J) a S(J) ♦ X ( J, T ) 

SSQ«SS0*S(J)**2 

10 SUMbSUH*S<J> 

XMEAnsSUM/XN 

VARb { (XN*SS0) -SUM**2) / (XN*(Xn-1 . ) ) 

C 

C CALCULATE ITEM DIFFICULTIES <P ANO 'J> . 

C 

SPObo, 

XIbNI 

00 12 U1,NI 
P ( I ) =0, 

00 13 Jal,NC 
13 P(I)bP(I)*X(J,I) 

P( I ) *P ( I) /XN 

<P ( I ) a (P(I)*100.) ♦.& 

Dal .O-P(l) 

12 SPuaSP0*0*H(D 
C 

C CALCULATE TEST RELIABILITY (KR20). 

C 

SO * SORT (VAR) 

RELIR a (XI/(XI-1, ))•(!,. SPO/VAH) 

SE = SO * SORT (1. -RELIR) 

C 

C CALCULATF F.sTIMaJEO ITEM R (SB EST). 
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c 



000210 




wTsl ,/xi 


000212 


c 

C PR 

r 


AVERa (wT*RELId) / < 1 .♦ ( NT-1 . ) *RELIH) ) 




MT output ano check for next problem. 


000216 




UO 1 4 K* 1 *10 


ooo??o 


14 


N (K) SK 


000223 




wRITE (6 i 501) V 


000230 




UO 15 I«1 *NC 


000232 




IF(NI,EQ,10) SO TO 16 


000234 




WRI TE ( 6 1 502) IlM I) t (KX(I,J) • J=l*Nl) • S ( I ) 


000253 




SO TO 15 


000254 


16 


WRITE! 6 *503) Il> ( 1 ) * (KX ( I « J) * J=1 *Nl ) *S ( l ) 


000274 


15 


continue 


000277 




IF(NI,E0,5) GO TO 20 


000301 




WRI TE (6*504) (KP (I) « I =1 *'JI ) « XMEAN* SO.HEL 1 8 * SE * AVER 


000324 




GO TO 150 


000325 


20 


WRITE (6*505) (KP( 1 ) * 1 = 1 ,'JI > * aMEAn.SD.RELIB* SE * AVER 


000351 


150 

r 


IFIRE LIB, GE«l*.OR. RE Lid. LE*0.) GO Ho 56 




C CALCULATE CUT mules 


00036? 


V 


VAL a 1, - SORT (AVER) 


000365 




00 75 1=1.9 


000367 


. 


ALPHA(I) = VAL / (WARM ) *1 * ) 


00037? 




dET All) = VAL - ALPHA(I) 


000374 




CA a l.- ALPHA! 1) 


000176 




CB « 1. - HETAd) 


000400 




00 7b J a 1. 5 


000401 




CUT ( I « J) = (ALOG (BETA ( I ) /CA) ♦ ( l , /X n) *«LOG ( WER ( J) ) ) / 
1 ALOGI (ALPHA! 1 ) •HETAd) ) / (CA*Cd) ) 


000426 




NCUT ( I , J ) = (CUT ( I * J) * XI) ♦ ,5 


040436 


76 


continue 


000437 


75 


continue 



c 

C PRINT CUT -PULE OUTPUT 



000441 


c 


«RIT£(6«600) W AH 


000447 




UO 77 J ■ 1, 5 


000451 


77 


WRITE 16,601) WER ( J) , (CUT ( I , J) , 1 = 1 ,V) , (NCUT (I , J) ,1 = 1*9) 


000500 




WRITE (6,602) VAL, (ALPHA(I) * 1»1 *9) » (HETAII) , 1=1 ,J) 


000522 


602 


FOR M AT <*oERROR ALPHfl/HETA VALuESV/f 5 . 3 * * A| PHA* 

1 F5,3*HFB.3/6X** BETA* FS, 3* BFH. 3 ) 


000522 


600 


FORMAT (*1 IEST CUT MULE6*/20X«*AlPHA 10 hETa HATtO* / 
1* ERR WT * 9F8.3 /) 


000522 


601 


FORMAT (*n*Fb.2*2X,9K5.3/8X,9IBl 


00052? 




GO TO 1 


0O0523 


8b 


WHITE (6*850) 


000527 


BbO 


format <* n cJi wulfs nut coMpmru for This tfst.*i 


000527 




1*0 To 1 


000530 


M6 


WRITE (6,860) 


000534 




STOP 


000536 


100 


format ( inAh, 2i6) 


000516 


101 


FORMAT C12A6) 


000536 


530 


FORMAT (*1 PROHI EM InFNTlHt.Ar|0,,, ;%*u;A6// 

1* NUMHFR OF cases * * . “IS/* MJN-tJtH OF ITEMS * . . *)h/) 


000536 


501 


F OHMaT ( //* SUdj£CT«l<)X«*l T F '• * 






7 



H4Xt # T0TAL*/ # 1.0. *1013.* SCO Ht»/) 

000536 502 FORMAT (2X * A6»5I3. 1 5X » F6. J /> 

000536 503 FORMAT (2X * A6* 101 3 tF6 . 1 /> 

000536 504 FORMAT ( *0PERCENl<»/« PASS M0I3/// 

1* TEST MEAN • F6.2// 

2* STANDARD DEVIATION . . • F7.3// 

3» RELIABILITY (KS20) . . *F8.4// 

4» STANDARD ERROR . . . . • F0.4// 

4* AVERAGE ITEM R . . . . • EG. 4/) 

0O0536 505 FORMAT ( •OPERCENT*/* PASS * 5IJ/// 

1* TEST MF AN • F6.2// 

2* STANDARD DEVIATION . . • F7.3// 

3* RELIABILITY (KP20 ) . . •F8.4// 

4* STANDARD ERROR . . . . • Ffl.4// 

4» AVERAGE ITEM R . . . . « F0 . 4/ ) 




74 




V 



J 



1 



PROBLEM IDENTIFICATION. ..NO-SHIFT ?-U»l 5-ITEM CONCEPT 1 TrsT 



n.n.) 



NUMREP OF CASES . . 
NUMHFH of ITFMS . . 



SURJFCT 

I.O # 


I 

1 2 3 


T r M 
4 5 


6 r H 


total 

0 10 SCORE 










323 


0 0 0 


1 n 




1 .0 










324 


1 1 0 


1 1 




4.0 










133 


1 1 1 


1 1 




5.0 










1 34 


1 1 1 


1 1 




5.0 










203 


n 1 1 


1 1 




4.0 










204 


0 1 0 


1 0 




2.0 










PERCENT 

PASS 


50 B3 50100 67 














TEST MEAN .... 




3.50 












standard 


deviation 


• • 


1.64 3 












RELIABILITY (KR2n) 


• • 


.851 3 












standard 


ERROR • 


... 


• 6336 












AVERAGE 


ITEM R . 


• . . 


.5330 












TFST CUT RULES 
ERR^WT 10.000 


ALPHA 

*.00f 


TO SET a 
3.000 


RATIO 

2.000 l.Oun 


• 500 


.330 


.200 


• 100 


100.00 


.12H 

1 


.158 

1 


• 1 RO 
1 


.223 .2*4 

1 1 


.377 

2 


.427 

2 


,484 

2 


.553 

3 


10.00 


. 2 OH 
1 


.248 

l 


.266 

1 


.323 .3*7 

2 2 


.477 

2 


.52.3 

3 


.574 

3 


.633 

3 


1 .0« 


• 2B7 
1 


.337 

2 


.382 

2 


.423 ,5U0 

2 2 


• 67/ 
3 


.619 

3 


.663 

3 


.713 

4 


• 10 


• 36 7 
? 


.428 

2 


.478 

? 


.523 .603 

3 3 


.677 

3 


.714 

4 


.752 

4 


.792 

4 


.01 


.447 

2 


.516 

3 


.574 

3 


.623 ,7U6 

3 4 


.777 

4 


.Bin 

A 


.1142 

4 


.872 

4 


ERROR 


AlPHA/BEU values 












.260 ALPHA .024 
BETA .*45 


.045 

.224 


.067 

.202 


.090 .136 

.180 .139 


• 1 BO 
.090 


.203 

.067 


. 2?4 

.045 


.245 

.024 



o 

ERIC 



75 



77 






PROBLEM IDENTIFICATION... NO-SHlFT 2-UPT 5-ITEM CONCEPT 2 TEST 



number OF CASES • . • 6 

number of items ... S 



SUBJECT 

I.ft. 


1 


I T 

2 3 4 


f M 

5 6 7 6 Q 10 


total 

SCORE 


323 


0 


0 1 1 


n 


2.0 


324 


1 


o 1 1 


1 


4.0 


133 


1 


1 1 1 


1 


5.0 


134 


1 


1 1 1 


1 


5.0 


203 


1 


1 1 1 


l 


5.0 


204 


0 


1 I 1 


1 


4.0 


PERCENT 

PASS 


67 


67100100 


B3 




TEST MEAN . 








STANDARD 


DEVIATION 


. . 1 . Ib9 




RELIABILITY 


(KR20) 


. . .7165 




STANOARD 


ERROR . . 


• • .6225 




AVERAGE 


ITEM R . . 


. . .3357 





TEST CUT RULES 

ALPHA TO BETA RATIO 



ERR WT 


10.000 


5.000 


3.000 


2.000 


1.000 


• 500 


.330 


.200 


• 100 


100.00 


.042 


.065 


• 094 


• 128 


.210 


.315 


.301 


, 456 


.544 




0 


0 


0 


l 


1 


2 


? 


2 


3 


10.00 


.145 


.185 


• 226 


.267 


.355 


.454 


.512 


.575 


.647 




1 


I 


1 


l 


2 


2 


3 


3 


3 


1.00 


.24V 


.305 


• 35B 


.407 


.500 


.593 


.644 


.695 


.751 




1 


2 


2 


2 


3 


3 


3 


3 


4 


• 10 


' .3S3 


.425 


.489 


• 546 


.645 


.733 


.775 


. R 1 5 


,055 




2 


2 


? 


3 


3 


4 


4 


4 


4 


.01 


• 4S6 


.544 


• 621 


.605 


.7 Vfl 


.87? 


.906 


.935 


.958 


ERROR 


? 3 

ALPHA/BETA VALUt 


3 

3 


3 


4 


4 


5 


5 


5 


.421 ALPHA 


.038 


.071 


.105 


• 140 


• 210 


.280 


.316 


.350 


.302 


BETA 


.382 


.350 


.315 


• 290 


• 210 


.140 


.104 


.070 


.030 



76 78 




HI 




PERCENT 

PASS 67 9? 75 SB 75 



TEST MEAN 3,67 

standard deviation , , 1.1SS 

RELIABILITY (KR20) . . .3906 

STANDARD ERROR 

AVERAGE ITER R • • . , .1136 



TFST COT 


RULES 


alpha to 


3ETa 


ERR NT 


10.000 


5.000 


-*.000 


100.00 


.026 


.049 


.OB? 




0 


0 


0 


l 0.00 


* 1 OB 


.15? 


• 200 




1 


1 


1 


1.00 


.191 


.254 


• 3 1 H 




1 


1 


2 


• 1« 


.273 


.356 


.436 




1 


2 


2 


.01 


• 35b 


.459 


.554 




2 


? 


3 


ERROR 


ALPHA/BETa VALUES 




.663 alpha 


• 060 


.110 


.166 


BETA 


• 603 


.552 


.497 



o 




RATIO 



2.000 


l.non 


.500 


.330 


.200 


,100 


.123 


.227 


• 363 


,44* 


.541 


.045 


1 


1 


2 


2 


3 


3 


.251 


.363 


.492 


.566 


.644 


.727 


1 


2 


2 


3 


3 


4 


• 38') 


.500 


• 620 


.683 


,746 


.*09 


2 


2 


3 


3 


4 


4 


• bQ* 


.637 


.749 


.801 


,*48 


.892 


3 


3 


4 


4 


4 


4 


.637 


.70 


• B77 


.919 


.95! 


.974 


3 


4 


4 


5 


5 


S 



.221 


• 331 


.44? 


,498 


.552 


.603 


.44? 


• 331 


.221 


.1*4 


.no 


.060 



77 



79 



* 



PROBLEM IDENTIFICATION. . .NO-SHIFT 4-U*T 5-ITEM CONCEPT I TfST 



NUMREH of cases • 
NUMRE» OF ITEMS . 


• 

• 


• 6 

. 5 




SURJFCT 

I.n. 


1 


2 J 


I T 
4 


E M 

5 6 7 H 9 10 


TOTAL 

SCORE 


103 


0 


0 1 


0 


0 


1.0 


104 


1 


l 0 


l 


n 


3.0 


21 3 


0 


0 1 


1 


0 


2.0 


?14 


0 


0 0 


0 


0 


0*0 


153 


0 


0 0 


0 


0 


0.0 


154 


0 


0 0 


0 


0 


0.0 


PERCENT 

PASS 


17 


17 33 


33 


n 




TEST MEAN . 










STANDARD 


OEVIATION 


• . 1.265 




RELIABILITY 


(KR20) 


. . . 6050 


• 


STANDARD 


ERROR . 


• 


. . .7091 




AVERAGE 


ITEM H . 


• 


. . . 3 030 





TEST COT 


rules 


ALPHA TO 


beta 


RATIO 










ERR WT 


10.000 


5,000 


<*.000 


2.000 


1 *003 


.600 


.330 


.200 


100. 00 


.025 


• 046 


• 074 


.107 


.191 


r 300 


.3*9 


• 44a 




0 


0 


0 


1 


1 


2 


2 


2 


10.00 


.133 


.17? 


• 213 


.255 


.345 


• 443 


.509 


.574 




1 


1 


1 


1 


2 


2 


3 


3 


1.00 


.242 


.299 


.353 


,401 


,500 


.697 


.64.3 


.701 




1 


1 


2 


2 


3 


3 


3 


4 


.10 


.351 


• 426 


.493 


.65? 


,6t>6 


.745 


. 7HR 


,02H 




2 


2 


2 


3 


i 


4 


4 


4 


.01 


.460 


.55? 


• 632 


.700 


,B09 


.893 


.927 


.954 




2 


3 


3 


3 


<• 


4 


5 


5 


ERROR 


ALPHA/«ETa VALUES 














,449 ALPHA 


.0*1 


.075 


.11? 


.160 


• ??4 


.299 


.337 


.374 


BETA 


.*08 


.374 


.337 


.299 


.224 


.150 


• 111 


.075 





U. 21. 1 



• loo 



.540 

3 

.*49 

3 

.758 



, 86 / 

4 

.975 

5 



.400 
• 041 



V 



f 



7 



problem identification. ..no-shift a-upt s-itfm concept ?. test n.2i.> 

number uf cases . • • 6 

NI|M«E« OF ITEMS ... 5 



SUBJECT 

1.0. 


1 


2 


3 


I T 
A 


E M 
5 6 


total 

7 0 9 10 SCORE 


103 


1 


1 


0 


1 


0 


3.0 


1 0 A 


1 


1 


1 


0 


n 


3.0 


213 


0 


1 


0 


1 


0 


2.0 


21A 


0 


0 


0 


1 


0 


1.0 


153 


0 


1 


0 


0 


1 


2.0 


1 54 


1 


1 


1 


1 


1 


5.0 


PERCENT 


PASS 


50 


B3 


33 


67 


33 





TEST MEAN 2.67 

STANDARD DEVIATION . , 1.366 

RELIABILITY (KR20) . . .50? 

STANDARD ERROR 9?3S 

AVERAGE ITEM R • • • • .1921 



TEST CUT 


rules 


ALPHA 


TO dET \ 


RATIO 












E»R *T 


10.000 


5,000 


3.000 


2.000 


1.000 


.500 


.330 


.200 


• 100 


ino. 00 


-.051 


-.0A5 


• .026 


.OOA 


.09? 


.221 


.307 


, AO A 


.510 




0 


0 


0 


0 


0 


I 


2 


2 


3 


10.00 


• 0«2 


.116 


♦ 155 


.197 


.?96 


.415 


.407 


,56a 


.652 




0 


1 


1 


1 


1 


2 


2 


3 


3 


1 .00 


.21b 


.276 


.335 


.391 


.500 


.60* 


.666 


.724 


. 705 




1 


1 


2 


2 


3 


3 


3 


A 


A 


.10 


• 3 AB 


• A36 


.515 


.505 


,70A 


• B03 


.046 


.004 


.910 




2 


2 


3 


3 


4 


4 


A 


A 


5 


.01 


.482 


.596 


• 695 


.779 


.908 


.996 


1.026 


1 .045 


1.QS1 




2 


3 


3 


A 


b 


5 


5 


5 


5 


ERROR 


alrma/beta values 














.562 Alpha ,q51 


.O'** 


• 1 AO 


.187 


.281 


,3 74 


.422 


• A6B 


.511 


BETA .511 


.468 


.<►21 


.37A 


• 281 


.107 


.139 


.094 


.051 




79 



81 




( 1.21 .) 



PROBLEM IDENTIFICATION. ..NO-SHIFT 4 -OPT 5-lT£M CONCEPT 3 TEST 

MUMHFR OF CASES ... 12 

number of items ... 5 



SUBJECT 

T.n. 


1 


2 


3 


T T 
4 


F. M 

5 6 7 


TOTAl 

ft 9 10 SCORE 


103 


0 


0 


0 


0 


0 . 


0.0 


104 


0 


0 


0 


0 


1 ) 


0.0 


213 


0 


0 


0 


0 


0 


0.0 


214 


0 


0 


1 


0 


0 


1.0 


153 


0 


0 


0 


0 


0 


0.0 


154 


1 


1 


1 


1 


1 


5.0 


1 OR 


0 


0 


0 


0 


1 


1.0 


107 


1 


0 


0 


0 


0 


1.0 


217 


1 


' 0 


0 


0 


0 


1.0 


2 1 R 


0 


0 


0 


0 


n 


0.0 


157 


1 


1 


0 


1 


u 


3.0 


15R 


1 


1 


1 


1 


1 


5.0 


PERCENT 


PASS 


4? 


25 


25 


25 


25 





TEST MfAM 1.42 

STANDARD OEVIATION . . 1.881 

RELIABILITY (KR20) . . .8991 

STANDARD ERROR 6974 

AVERAGE ITEM H 6407 



test cut 


rules 


ALPHA TO 


BETA 


RATIO 












err WT 


10.000 


5,000 J 


• 000 


2.000 


1.000 


.500 


.330 


.200 


• 100 


100.00 


.237 


.276 


.313 


.346 


.413 


.484 


.524 


.570 


.623 


1 


1 


2 


2 


2 


2 


3 


3 


3 


10.00 


.272 


.315 


.354 


.389 


.456 


• 526 


.565 


.608 


.658 




1 


2 


2 


2 


2 


3 


3 


3 


3 


1 .00 


.307 


.353 


.395 


.431 


.500 


• 569 


.606 


,647 


.693 




' 2 


? 


2 


2 


3 


3 


.3 


3 


3 


.10 


.342 


.39? 


.4 36 


.474 


,544 


.611 


.647 


,685 


.72H 


2 


? . 


2 


2 


3 


3 


3 


3 


4 


• ol 


.377 


.*3o 


.477 


.516 


.587 


• 654 


• 68 ft 


.724 


.763 




2 


2 


2 


3 


3 


3 


3 


4 


4 


ERROR 


ALPHA/BETA VALUES 
















.200 ALPHA .018 


.033 


050 


.067 


.100 


.133 


.150 


.166 


• 1 R 1 


BETA .181 


. 1 66 . 


150 


.133 


.100 


.067 


.050 


.033 


.018 



ERIC EZ 






f 



7 



■P 



PROBLEM IDENTIFICATION,*, NO-SHIFT 2-OPT 10-ITEM CONCEPT 1 TEST <1,12,1 



NUMBER of 


CASES . 


• 


• 


6 








NUMBER of 


ITEMS , 


• 


, 


10 








SUBJECT 






I T 


r 


M 






total 


t.o. 


1 


2 3 


4 


S 


6 7 0 


9 


10 


SCORE 


321 


0 


0 1 


0 


1 


0 0 0 


1 


0 


3,0 


322 


1 


0 1 


0 


0 


1 0 0 


0 


1 


4,0 


131 


1 


1 1 


0 


0 


0 1 1 


0 


1 


6,0 


132 


1 


1 1 


1 


1 


1 1 1 


1 


1 


10,0 


201 


0 


0 0 


0 


0 


0 0 0 


0 


0 


0,0 


202 


1 


1 1 


1 


1 


1 1 1 


1 


1 


10,0 


PERCENT 


















PASS 


67 


SO 03 


33 


so 


50 SO SO ! 


so 


67 




TEST MEAN 








, 5,50 








STANOARO 


OCVIATION 




• 3,907 








RELIABILITY (KR20) 




• ,9500 








STANOARO 


ERROR , 


, 




• ,0917 








AVERAGE 


ITEM H , 


• 




. ,655? 









TEST CUT 


rules 


ALPHA TO 


seta 


RATIO 












ERR WT 


10,000 


5,000 


3,000 


2,003 


1 ,000 


.500 


.330 


,200 


• 100 


100,00 


.173 


,205 


• 236 


,266 


,329 


,401 


• 444 


,493 


.553 




2 


2 


2 


3 


3 


4 


4 


5 


6 


10*00 


• 241 


• 200 


• 316 


,349 


.415 


• 484 


• 525 


,569 


.62? 




2 


3 


3 


3 


4 


5 


5 


6 


6 


1,00 


,310 


,356 


• 396 


,433 


,500 


.567 


,605 


,644 


,690 




3 


4 


4 


4 


5 


6 


6 


6 


7 


• 10 


,370 


• A3i 


,476 


,516 


,505 


.651 


• 605 


,720 


.759 




4 


4 


5 


5 


6 


7 


7 


7 


0 


,01 


.447 


,507 


,557 


,599 


,671 


.734 


.765 


,795 


,R27 




4 


S 


6 


6 


7 


7 


8 


e 


0 


ERROR 


alpha/beta values 
















,191 ALPHA 


. .017 


.932 


• 040 


,064 


.095 


• 127 


.1*3 


• 159 


• 173 


BETA 


,173 


.159 


• 143 


.127 


,095 


• 064 


.0*7 


.032 


• 017 




81 



S3 



ft 



PROBLEM IDENTIFICATION. ..NO-SHIFT 2-OPT 10-ITEM CONCEPT 2 TEST U.12.I 



NUMBER OF CASES . • . 
NUMBER of ITEMS . . . 

SUBJECT I T 

I.n. 1 2 3 * 

321 0 0 1 1 

322 0 0 ' 1 1 

131 1110 

132 1111 

201 0100 

202 1111 

PERCENT 

PASS 50 67 03 6T 

TEST MEAN 

STANDARD OEVIATION 
RELIABILITY IKR20) 
STANDARD ERROH . . 
AVERAGE ITEM H . . 



6 

10 

E u TOTAL 

567 8 9 10 SCORE 

0 0 1 1 0 0 * . 0 

001 110 5.0 

Oil 001 6.0 

111 111 10.0 

101111 b.O 

100 000 5.0 

50 33 B3 67 50 50 

. . 6.00 

. . 2.098 

. . . 56*0 

. . 1.3051 

. . . 11*5 



TC^T CUT 
ERR Wt 


rules 

10.000 


ALPHA TO 
5.000 3 


3ETA 

.000 


RATIO 

2.000 


1«000 


.500 


.330 


.200 


100*00 


-•137 

-0 


-.154 

-1 


.151 

-1 


-.13? 

-0 


-.045 

0 


• 108 
1 


.215 

2 


• 338 
3 


10*00 


• 027 
0 


.050 

1 


• 0»3 
1 


• 124 
1 


• 228 
2 


.364 

4 


.449 

4 


.542 

5 


1*00 


• 191 
2 


• 254 
3 


• 318 
3 


• 380 

4 


.500 

5 


• 620 
6 


.693 

7 


.746 

7 


• 10 


.355 

4 


• 458 
5 


.553 

6 


• 636 
6 


.772 

8 


.876 

9 


.917 

9 


.950 

9 


• 01 


.519 

5 


.662 

7 


• 788 
8 


• 892 
9 


1.045 

10 


1.13? 

11 


1.152 

12 


1 .15* 
12 


error 


ALPHA/BETA VALUES 














.662 ALPHA .060 
SETA .601 


.110 

.551 


.165 

.496 


.441 


.331 
• 331 


.441 
• 221 


.497 
• 164 


.551 

.110 



o 

ERIC ? 






.82 




• 100 



• 481 

5 

• 645 

6 

• 809 

8 

• 973 
10 

1*137 

11 



• 601 
• 060 



m 









* 



PROBLEM IDENTIFICATION. ..NO-5HIFT 2-OPT 10-ITEM CONCEPT 3 TEST (1.1?. I 



numrer op cases 


• 


• 


• 




12 








number OF ITEM5 


1 • 


• 


• 




10 








SUBJECT 






I 


T F. 


M 








TOTAL 


I.n. 


1 2 


3 


4 


5 


6 


7 B 


9 


10 


SCORE 


321 


0 0 


0 


0 


1 


1 


1 1 


0 


1 


b.O 


32? 


1 1 


0 


0 


1 


1 


l 0 


1 


0 


6.0 


131 


0 1 


0 


1 


1 


0 


1 1 


1 


1 


7.0 


132 


1 1 


1 


1 


1 


1 


0 0 


1 


1 


0.0 


201 


0 1 


1 


1 


1 


1 


1 1 


1 


1 


9.0 


20? 


1 1 


1 


1 


1 


1 


1 1 


1 


0 


9.0 


325 


1 1 


0 


1 


1 


1 


0 1 


1 


0 


7.0 


326 


0 0 


1 


1 


0 


1 


0 0 


1 


0 


4.0 


135 


1 1 


0 


1 


11 


1 


0 1 


1 


0 


6.0 


136 


1 0 


0 


0 


0 


1 


0 0 


0 


0 


2.0 


205 


1 0 


0 


1 


1 


0 


1 1 


1 


0 


6.0 


206 


1 1 


1 


1 


1 


1 


1 1 


1 


1 


10.0 


PERCENT 




















PASS 


67 67 4? 


75 


75 


03 


59 67 i 


83 


4? 




TEST MEAN • . • 










CO 

m 

*> 








5T ANpAPO 


OEVIATION 


• • 




2.275 








RELIABILITY (KR20) 


• • 




.6712 








standard 


ERROR 


• 


• 


• • 




1.39*3 








AVERAGE 


ITEM R 


• 


• 


• . 




.1695 










TE«5T CUT POLES 







ALPHA 


TO SETA 


PATIO 


ERR WT 


10.000 


5.000 


J.ooo 


2.000 


100.00 


.069 


.100 


.138 


.181 




1 


1 


1 


? 


10.00 


.139 


.105 


.234 


• 284 




1 


? 


2 


3 


1 .00 


.209 


.270 


.331 


• 38Q 




2 


3 


3 


4 


.10 


.279 


.355 


.427 


.49? 




3 


4 


4 


5 


.01 


.349 


.440 


.523 


.596 




3 


4 


5 


6 



ERROR ALPHA/RETA VALUES 



.1*7 . .196 

.*♦1 .392 



1 .000 


.500 


.330 


.200 


• 100 


.201 


.404 


.479 


.560 


• 651 


3 


4 


5 


6 


7 


.390 


.506 


.575 


.645 


.721 


4 


5 


6 




7 


.500 


.612 


.671 


.730 


.791 


b 


6 


7 


7 


8 


.610 


.716 


.76 f 


.015 


.961 


6 


7 


0 


0 


9 


.719 


.619 


.063 


.900 


.932 


7 


0 


9 


9 


9 



m 



.294 


.39? .4*2 


• 490 


.535 


• 29a 


.196 V \,1*S 


.090 


.053 






"'<SL3 



580 ALPHA .053 
BETA .535 



098 

490 



535 

053 



PROBLEM IDENTIFICATION. ..N0-5HIFT 4-UPT 10-ITEM CONCEPT 1 TEST 



U.22.) 



NUMBER OF CASES . 


• 


• 




6 








number of hems . 


• 


• 


10 








SUBJECT 






I T 


E 


M 








total 


I.n. 


1 


2 3 


4 


S 


6 


7 B 


9 


10 


SCORE 


101 


0 


0 0 


0 


0 


0 


0 0 


0 


0 


0.0 


102 


I 


0 o 


0 


1 


1 


0 0 


0 


1 


4.0 


211 


0 


0 0 


0 


0 


1 


0 0 


1 


0 


2.0 


212 


0 


1 0 


1 


1 


1 


1 1 


1 


1 


8.0 


151 


0 


0 0 


0 


0 


0 


0 0 


0 


0 


0.0 


152 


1 


0 o 


1 


1 


1 


0 1 


0 


0 


S.O 


PERCENT 




















PASS 


33 


17 0 


33 


So 


67 


17 33 : 


33 


33 




TEST MEAN . 










3.17 








standard deviation 


• 




3.125 








RELIABILITY 


(KR2M 


• 




.9994 








standaru 


ERROR • 


• 


• 




.9913 








average 


ITEM R . 


• 


• 




.472^ 









TEST CUT 


rules 


alpha to 


3E T A 


RATIO 












ERR WT 


10.000 


s.ooo 


3.000 


2.000 


l.OUO 


.500 


.330 


.200 


.100 


100.00 


• 103 


.133 


• 165 


.198 


.272 


.361 


• 416 


.470 


.552 




1 


1 


2 


2 


3 


A 


4 


5 


6 


10.00 


.190 


• 230 


• 27o 


• 30H 


• 3dr> 


.472 


• 521 


.575 


.638 




2 


2 


3 


3 


4 


5 


5 


6 


6 


1 .00 


.276 


.327 


• 3 7S 


.418 


.500 


.582 


• 626 


.673 


.724 




3 


3 


4 


4 


5 


6 


6 


7 


7 


• 10 


.362 


.42S 


.480 


.529 


.614 


.692 


.7*31 


.770 


.810 




4 


4 


s 


5 


6 


7 


7 


0 


0 


• 01 


.446 


• S22 


• 585 


.639 


.728 


• 00? 


.836 


.867 


.897 




4 


5 


6 


6 


7 


8 


B 


9 


9 


ERROR 


ALPHA/8FTA VALUES 
















.313 ALPHA '.028 


.052 


.078 


.104 


.156 


.209 


.235 


.261 


.205 


BETA .28S 


.261 


.235 


.209 


.156 


• 1 U4 


.07« 


.052 


.020 



S6> 4 

o 

ERIC 



PROBLEM IDENTIFICATION. ..NO-SHIFT 4 -O 0 T 10-ITEM CONCEPT 2 TEST (1.22.) 



NUMBER 


OF 


CASES . 


• 


• 




6 










NIJMRER 


OF 


ITEMS . 


• 


• 




10 










SUBJECT 








I T 


E 


M 










10TAL 


I.ft. 


1 


2 


3 


4 


5 


6 


7 


R 


9 


10 


SCORE 


101 


1 


0 


0 


0 


1 


0 


0 


l 


n 


0 


3.0 


10? 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


1 .0 


211 


1 


1 


1 


0 


0 


1 


0 


0 


0 


0 


4.0 


212 


1 


1 


1 


0 


0 


1 


1 


0 


l 


1 


7.0 


151 


1 


1 


0 


1 


0 


1 


1 


1 


l 


0 


7.0 


15? 


1 


1 


1 


1 


1 


1 


l 


1 


l 


1 


10.0 


PERCENT 
























PASS 


R3 


67 


67 


33 


33 


67 


50 


50 


50 


33 





TEST MEAN 5.33 

STANOARD DEVIATION . . 3.266 

RELIABILITY (KR20) . . .879* 

STANDARD ERROR .... 1.1331 

AVERAGE ITEM H 



TEST CUT RULES 







ALPHA TO 


BETA 


RATIO 












ERR WT 


10.000 


5.000 3 


• 000 


2.000 


1.000 


.500 


.330 


• 200 


• 100 


100.00 


.003 


.110 


• 142 


.175 


.252 


.347 


.405 


.471 


.550 




1 


1 


1 


2 


3 


3 


4 


5 


5 


10.00 


.174 


.215 


.255 


• ?9S 


.376 


• 466 


.519 


.576 


.642 




2 


2 


3 


3 


4 


5 


5 


6 


6 


1.00 


• 266 


.319 


.369 


.414 


.500 


.586 


.632 


.681 


.734 




3 


3 


4 


4 


5 


6 


6 


7 


7 


.10 


• 35b 


.424 


.4B3 


.534 


.624 


.70S 


.746 


.785 


.826 




4 


4 


5 


5 


6 


7 


7 


e 


B 


.01 


.450 


.529 


.596 


• 653 


.749 


.825 


.859 


.R9o 


.917 




5 


5 


b 


7 


t 


B 


9 


9 


9 


error 


ALPHA/BETa values 
















.350 ALPHA 


.032 


.058 


068 


.117 


.175 


.233 


?63 


.292 


.318 


BETA 


.318 


.292 


263 


.233 


.175 


.117 


087 


.058 


.032 







I 



85 



( 1 . 22 .) 



PROBLEM IDENTIFICATION.. .NO-SHIFT 4-UPT 10-ITEM CONCEPT 3 TEST 



number or 


CASES 


• 


• 


• 


12 








number or 


ITEMS 


• 


• 


• 


10 








SUBJECT 








I T 


E 


M 








TOTAL 


I .D. 


1 


2 


3 


4 


5 


6 


7 e 


9 


10 


SCORE 


101 


0 


0 


0 


0 


0 


0 


0 0 


0 


0 


0.0 


102 


0 


0 


0 


0 


0 


0 


0 0 


0 


0 


0,0 


211 


1 


1 


1 


1 


1 


1 


l i 


1 


1 


10.0 


212 


1 


1 


1 


1 


1 


1 


l l 


1 


1 


10«0 


151 


1 


0 


1 


1 


0 


1 


1 0 


0 


0 


5.0 


152 


1 


1 


1 


1 


1 


1 


1 1 


1 


1 


10.0 


105 


0 


0 


0 


0 


0 


0 


0 0 


0 


0 


0,0 


106 


0 


0 


1 


0 


0 


0 


0 0 


0 


0 


1 .0 


215 


0 


0 


0 


0 


0 


0 


1 0 


0 


0 


1.0 


226 


0 


0 


0 


0 


0 


0 


0 1 


0 


0 


1 .0 


155 


1 


1 


1 


1 


1 


1 


1 1 


1 


1 


l 0 • 0 


156 


0 


0 


0 


0 


0 


1 


coo 


0 


0 


1.0 


PERCENT 






















PASS 42 


33 sn 


42 


33 


50 


50 42 : 


33 


33 




TEST MEAN • 












4,08 








STANDARD 


DEVIATION 


• 


• 


4,562 








RELIABILITY 


(KR20) 


• 


• 


• 9847 








STANOARU 


ERROR 




• • 


• 


• 


• 5647 








AVERAGE 


ITEM R 




• • 


• 


• 


• 8653 









test cut 


rules 


ALPHA 


TO BETA 


RATIO 












ERR WT 


10,000 


5,000 


3,000 


2,000 


1 .non 


• 500 


,330 


• 200 


• 100 


100.00 


.306 


.339 


• 368 


• 394 


,44? 


.493 


• 522 


• 555 


• 595 




3 


3 


4 


4 


4 


5 


5 


6 


6 


10.00 


.330 


• 365 


• 396 


• 42? 


,4 M 


.521 


• 550 


.581. 


• 620 




3 


4 


4 


4 


5 


5 


5 


6 


6 


1,00 


• 355 


.392 


• 423 


• 451 


,500 


.549 


• 577 


,608 


,645 




4 


4 


4 


5 


5 


5 


6 


6 


6 


• in 


• 3tt0 


.419 


.451 


• 479 


,529 


.578 


• 60S 


• 635 


• 670 




4 


4 


5 


5 


5 


6 


6 


6 


7 


• 01 


.405 


.445 


• 479 


• 507 


,558 


• 606 


,633 


,661 


• 694 




4 


4 


5 


5 


6 


6 


6 


7 


7 


ERROR 


ALPHA /SET A VALUES 














.070 ALPHA .006 


• 012 


.017 


• 023 


• 035 


.047 


>052 


• 058 


• 063 


BETA .063 


• 058 


.052 


• 047 


• 035 


• 023 


>017 


• 012 


,006 



^ 86 

o 

ERLC 



88 



problem identification. ..shift 2-opiion 5-item concept i test 12.11.1 



NUMPER of cases . . . 
NUMBER of items . . . 


6 

5 


SUBJECT 

I.o. 1 


ITEM 
2 3*56 


TOTAL 

7 R 9 10 SCORE 


U3 n 


1111 


4.0 


m 1 


1111 


5.0 


313 1 


1101 


4.0 


314 1 


1111 


5.0 


303 1 


1111 


5.0 


304 1 


1111 


5.0 


PERCENT 

PASS 83100100 83100 




TEST mean , 




4,67 


standard deviation , , 


.516 


RELIABILITY 


(KR20) . . 


-.0521 


STANDARD error .... 


.5297 


AVERAGE ITEM 


R . . . . 


-.0100 


CUT RULES 


NOT COmpijTfD 


for This test. 



O 




4 



69 



87 



PROBLEM IDENTIFICATION. .. SHIFT 2-OPTION 5-ITEM CONCEPT 2 TEST <2.11.1 



NUMBER OF CA5E5 ... 6 

NUMBER OF ITEMS ... 5 



SUBJECT 

1.0. 


1 


2 


3 


I T 
4 


E M 

5 6 7 


total 

8 9 10 SCORE 


113 


1 


1 


1 


1 


1 


5.0 


114 


1 


1 


0 


1 


n 


3.0 


313 


1 


0 


1 


1 


0 


3.0 


314 


0 


1 


1 


0 


l 


3.0 


303 


1 


1 


1 


1 


i 


5.0 


304 


1 


1 


1 


1 


l 


5.0 


PERCENT 

PASS 


83 


83 


83 


83 


6 7 





TEST MEAN A. 00 

STANDARD DEVIATION . . 1.09S 

RELIABILITY (KR20) . . .439fl 

STANDARD ERROR 

AVERAGE ITEM 1357 



TEST CUT 


rules 


alpha 


TO 3ETA 


RATIO 


ERR ¥T 


10.000 


5.000 


... 3.000 


2.000 


100.00 


-.109 


-.116 


-.108 


-.085 




-0 


-0 


-0 


0 


10.00 


.045 


.072 


• 108 


.149 




0 


0 


1 


1 


1 .00 


.198 


• 26} 


.323 


.383 




1 


1 


a 2 


2 


.10 


.352 


.449 


.539 


.617 




2 


2 


3 


3 


.01 


.505 


.638 


.755 


.851 




3 


3 


4 


4 


ERROR 


ALPHA/8ETA VALUES 




.63? ALPHA 


.057 


.105 


.158 


.211 


BETA 


.574 


.526 


.474 


• 421 



l.OUn 


• 500 


.330 


.200 


.100 


.004 


.149 


.248 


.362 


.495 


U 


1 


1 


2 


2 


.262 


• 383 


.463 


.551 


.648 


) 


2 


2 


3 


3 


.500 


.617 


.678 


.739 


.802 


3 


3 


3 


4 


4 


.748 


.851 


.893 


.928 


.955 


4 


4 


4 


5 


5 


.996 


1.085 


1.108 1 


.116 


1.109 


5 


5 


6 


6 


6 


.316 


.421 


.475 


,526 


.574 


• 316 


• 211 


.157 


,105 


.057 




£088 



problem identification. ..shift 2 -option 5-item concept 3 test te.n.) 

NUMBER OF CASES ... 12 

NUMBER OF ITEMS ... 5 



SUBJECT 
I.O. 1 

in o 

114 0 

313 0 

314 1 

303 1 

304 1 

127 0 

12B 0 

317 0 

31« ft 

3n7 1 

30R 0 



ITEM 

2 3 4 5 6 7 

0 0 0 0 
0 10 0 
0 110 
OllO 
1111 
0 10 1 
0 0 0 0 
10 11 
l 0 0 0 

110 0 
0 111 
0 0 0 1 



total 

B Q 10 SCORE 
0.0 
1.0 
2.0 

3.0 

5.0 

3.0 

0.0 

3.0 

1.0 
2.0 



PERCENT 

PASS 33 33 58 42 42 

TEST mean 2.08 

STANDARD DEVIATION , . 1,564 

RELIABILITY (KR20) . . .6505 

STANDARD ERROR 9248 

AVERAGE ITEM R • « . . .2712 



TEST CUT RULES 







alpha TO 


BETA 


RATIO 












ERR WT 


10.000 


5.000 3 


.000 


2.000 


l.ooo 


.500 


.330 


.200 


• 100 


100.00 


• 120 


.15A 


.199 


.241 


.334 


.441 


.504 


.573 


.650 




1 


1 


1 


1 


2 


2 


3 


3 


3 


10. frn 


.178 


.225 


.274 


.321 


.417 


.520 


.570 


.640 


.70B 




1 


1 


1 


2 


2 


3 


3 


3 


4 


1.00 


.235 


.293 


.348 


.40 0 


.500 


• 600 


.65} 


.707 


.765 




1 


1 


2 


2 


3 


3 


3 


4 


4 


.10 


.292 


.360 


.423 


.4B(i 


.583 


.679 


.727 


.775 


.822 




1 


2 


2 


2 


3 


3 


4 


4 


4 


.01 


.350 


.427 


.497 


.559 


• 666 


.759 


.B«>2 


.842 


• BBO 




2 


2 


2 


3 


3 


4 


4 


4 


4 


error 


ALPHA/BETA VALUES 
















.4 70 ALPHA 


.04* 


. 0B0 . 


120 


.160 


.240 


.319 


.360 


399 


.416 


BETA 


.436 


.399 


359 


.319 


.240 


• 160 


.119 


oao 


• 044 



89 



o 

ERJC 



91 



PROBLEM IDENTIFICATION... shift 4-option 5-ITEM CONCEPT l TEST (2.21.) 

NUMBER of CASES ... 6 

NUMRER of items ... 5 



SUBJECT 

1.0. 


l 


2 


3 


I T 
4 


F M 

5 to 7 8 


TOTAL 
9 ro SCORE 


123 


1 


1 


l 


1 


1 


5.0 


124 


0 


0 


1 


0 


1 


2.0 


143 


0 


0 


0 


0 


0 


0.0 


144 


1 


1 


1 


1 


l 


5.0 


223 


l 


0 


0 


0 


0 


1.0 


224 


1 


1 


1 


1 


1 


5.0 


PERCENT 


PASS 


67 


50 


67 


50 


67 





TEST MEAN 3.00 

STANOARO DEVIATION . . 2.280 

RELIABILITY IKR20) . . ,969ft 

STANOARO ERROR 

AVERAGE ITEM R B641 



TEST CUT 


RULES 


ALPHA TO 


beta 


RATIO 












ERR WT 


10.000 


5.000 


J.ono 


2.000 


i.OUO 


.500 


.330 


.200 


.10 0 


100.00 


.256 


• 285 


.312 


.337 


.384 


.436 


.467 


.502 


.546 




1 


l 


2 


2 


? 


2 


2 


3 


3 


10.00 


.305 


.339 


. 368 


.393 


.442 


.493 


.522 


.555 


*595 




2 


2 


2 


2 


2 


2 


.1 


3 


3 


1.00 


.355 


.392 


.423 


.459 


.500 


*550 


.577 


.608 


.645 




2 


2 


2 


2 


3 


3 


3 


3 


3 


.10 


.405 


.445 


.479 


.507 


.Sb* 


.607 


.633 


.66] 


.995 




2 


2 


2 


3 


3 


3 


3 


3 


3 


.01 


.454 


.498 


.534 


.564 


.616 


.663 


. 6HH 


.715 


.744 




2 


2 


3 


3 


3 


3 


3 


4 


4 


ERROR 


ALPHA/BETA VALUES 
















.070 ALPHA 


• 006 


.012 


.018 


.023 


• 03b 


.047 


,053 


.059 


.064 


BETA 


.064 


.059 


.053 


.047 


.035 


.023 


,017 


.012 


.006 



22 

o 

ERIC 






PROBLEM IOENTtPICATION... SHIFT 4-OPWOM 5-ITEM CONCFPT ? TEST (2.21.) 

NUMBER OF CASES ... 6 

NUM^R OF ITEMS ... 5 



SUBJECT 

I.D. 


1 


2 


3 


I T 
4 


E M 

5 6 7 


TOTAL 
H 9 10 SCORE 


123 


1 


1 


1 


l 


1 


5.0 


124 


n 


1 


1 


1 


1 


4 • (i 


143 


1 


0 


0 


0 


1 


2.0 


144 


1 


1 


1 


1 


1 


5.0 


223 


1 


1 


0 


1 


n 


3.0 


224 


1 


1 


1 


1 


1 


5.0 


PERCENT 


PASS 


A3 


83 


67 


83 


H3 





TEST m EAN 4.00 

STANOAWO DEVIATION . . 1.265 

RELIABILITY (KR20) . . .6424 

STANDARD ERROR 

AVERAGE TTEM R 2643 



TEST CUT RULES 







ALPHA 


TO dF_ T A 


ratio 












ERR WT 


10.000 


b.000 


3. 0U0 


2.000 


1.000 


• 300 


.330 


.200 


• 100 


100.00 


• 001 


• O 1.9 


• 044 


.077 


.182 


• 278 


• 352 


.436 


.535 




0 


0 


O 


0 


1 


1 


2 


2 


3 


c 

o 

c 


.117 


.155 


• 1 96 


.23A 


.331 


.439 


.501 


.572 


.851 




1 


1 


1 


1 


2 


2 


3 


3 


3 


1.00 


.233 


.29) 


.347 


.399 


.500 


• oOl 


.654 


.709 


.767 




1 


1 


2 


2 


3 


3 


3 


4 


4 


• 10 


.340 


.42R 


. 4 90 


.561 


.669 


.76? 


• AOS 


.845 


• HB3 




2 


2 


2 


3 


3 


4 


4 


4 


4 


.01 


.465 


.564 


• 650 


.72? 


.8j8 


.923 


.956 


.981 


.99R 




2 


3 


3 


4 


4 


5 


S 


. 5 


5 


error 


ALPMA/RETa VAlJES 














.486 ALPHA 


• 044 


.0«1 


.121 


• 162 


.243 


• 324 


365 


.405 


.442 


BETA 


.442 


.405 


• 364 


.324 


.?43 


• 16? 


121 


.081 


• 044 




91 
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PROBLEM IDENTIFICATION. ..SHIFT 4-OPTION 5-ITEM CONCEPT 3 TEST (2.21.) 



NUMBER OF CASES ... 12 

NUMBER OF ITEMS ... 5 



SUBJECT 

i.n. 


1 


2 


3 


l T 
4 


E M 

5 6 7 8 


TOTAL 
9 10 SCORE 


123 


1 


0 


1 


1 


1 


4.0 


124 


n 


1 


1 


1 


1 


4.0 


143 


n 


0 


0 


0 


0 


0.0 


144 


l 


1 


1 


1 


1 


5.0 


223 


l 


0 


0 


1 


1 


3.0 


224 


l 


0 


1 


1 


1 


4.0 


n? 


l 


0 


0 


1 


1 


3.0 


lift 


l 


0 


1 


0 


n 


2.0 


147 


l 


0 


0 


1 


0 


2.0 


14A 


0 


0 


0 


0 


0 


0.0 


?27 


l 


0 


0 


1 


l 


3.0 


22ft 


0 


1 


1 


0 


0 


2.0 


PERCENT 

PASS 


67 


25 


5(1 


67 


5ft 





TEST MEAN 2.67 

STANDARD DEVIATION . . 1.557 

RELIABILITY (KR20! . . .6699 

STANDARD ERROR 89*5 

AVERAGE ITEM R 2887 



o 

ERLC 



TEST CUT 


RULES 


ALPHA to 


8ETa 


RATIO 












ERR WT 


10.000 


5.000 


4.000 


2*000 


1 .000 


.500 


.330 


.200 


.100 


100.00 


.12/ 


.166 


.207 


.249 


.340 


.445 


.50 r 


.574 


• 650 


1 


1 


1 


1 


2 


2 


3 


3 


3 


10.00 


.183 


.231 


.279 


• 326 


.42ft 


.522 


.578 


.639 


.705 




1 


1 


1 


2 


2 


3 


3 


3 


4 


1.00 


.23V 


.296 


.351 


.40? 


.Son 


.598 


.650 


.704 


.761 




1 


1 


2 


2 


J 


3 


3 


4 


4 


.10 


• 295 


.361 


.423 


.478 


.58ft 


.674 


.72? 


.769 


• SI 7 


1 


2 


2 


2 


3 


3 


4 


4 


4 


.01 


.350 


.426 


.495 


.555 


.66ft 


.751 


.794 


.834 


.073 


ERROR 


2 2 
ALPHA/AETA VALUES 


2 


3 


J 


4 


4 


4 


4 


.463 ALPHA .0*2 


.077 


.116 


.154 


.231 


• 306 


.348 


.386 


.421 


BETA .421 


.366 


.347 


.306 


.231 

92 


.154 


.115 


.07/ 


.042 






* 



problem identification. ..shift 2 -opttion io-item concept i test ( 2 . 12 .) 

number of cases . . , 6 

NUMBER of ITEMS . • • 10 



SUBJECT 

l.n. 


1 


2 


1 

3 


4 


T E 

5 


M 

6 


7 e 


9 


10 


TOTAL 

SCORE 


121 


1 


1 


0 


0 


0 


0 


0 0 


1 


1 


4.0 


122 


0 


0 


1 


0 


1 


0 


0 1 


0 


1 


4.0 


311 


1 


1 


1 


1 


1 


1 


1 1 


1 


1 


10.0 


312 


1 


1 


1 


1 


1 


l 


1 1 


1 


1 


10.0 


301 


1 


1 


l 


I 


1 


1 


1 1 


1 


1 


10.0 


30? 


1 


1 


1 


1 


1 


l 


1 1 


1 


1 


10.0 


PERCENT 

PASS 


83 


83 


B3 67 


83 


67 


67 83 


S1100 




TEST mean . 












8.00 








STANOARD 


DEVIATION 




• • 




3.098 








reliability 


(KR20) 




< • 




.937*5 








STANOARD 


ERROR 


• 


• 


• • 




• 7746 








AVERAGE 


ITEM R 


• 


• 


• . 




.6000 









TEST CUT RULES 







alpha to 


9ETA 


RATIO 












ERR NT 


10.000 


5.000 ^ 


1.000 


2.000 


1.000 


.500 


.330 


.200 


• 100 


100.00 


.153 


.184 


.216 


.247 


.314 


.391 


.437 


.490 


.55% 




2 


2 


2 


2 


3 


4 


4 


5 


6 


10.00 


.226 


.266 


.303 


.338 


.407 


• 481 


.524 


.571 


.627 




2 


3 


3 


3 


4 


5 


6 


6 


6 


1.00 


.300 


.347 


.39 0 


.4?fl 


.500 


.572 


.611 


.653 


.700 




3 


3 


4 


4 


5 


6 


6 


7 


7 


.10 


.373 


.429 


.477 


.519 


.593 


. 66? 


.6RB 


.734 


.774 




4 


4 


5 


5 


6 


7 


7 


7 


8 


.01 


. 446 


.510 


.564 


.609 


.686 


.753 


. 788 


.816 


.847 




4 


5 


6 


6 


7 


8 


8 


8 


8 


error 


alpha/beta values 
















.225 ALPHA 


• 020 


.036 


056 


.075 


.113 


• 150 


.169 


> 1 88 


• 205 


BETA 


.205 


. 188 


169 


.150 


.113 


.075 


.058 


,038 


.020 



S5 

93 

o 

ERIC 



PROBLEM IDENTIFICATION... SHIFT 2-OPTJUN lo-ITEM CONCEPT 2 TEST <2.12.1 

NUMBER OF CASES ... 6 

NUMBER OF ITEMS ... 10 

ITEM TOTAL 

23*56 7 89 10 SCORE 

1 0 0 0 1 0 0 0 1 4.0 

111101 101 

111111 111 10.0 

000000 000 0.0 

111111 111 10.0 

1 1 1 1 1 1 1 1 1 10.0 

percent 

PASS 83 83 6T 67 67 67 67 67 50 83 

TEST MEAN 7.00 

STANOARO OEVIATION , . 4.1*7 

RELIABILITY (KR20I . . .9819 

STANOARO ERROR .... .5578 

AVERAGE ITEM R .... .8444 



SUBJECT 
1.0. 1 

121 1 

122 1 

311 1 

312 0 

301 1 

302 1 



test cut 


RULES 


ALPHA TO 


beta 


RATIO 












ERR WT 


10.000 


5.000 


3.000 


2.000 


1.000 


• 500 


• 330 


.200 


• 100 


100.00 


.247 


4277 


• 304 


.329 


.379 


• 433 


.465 


.501 


.547 




2 


3 


3 


3 


4 


4 


5 


5 


5 


10.00 


.290 


.33? 


• 362 


.309 


.439 


.492 


.523 


.557 


.599 




3 


3 


4 


4 


4 


5 


5 


6 


6 


1.00 


. 3S0 


• 309 


.420 


.448 


.500 


.552 


.501 


.612 


.650 




3 


4 


4 


4 


5 


6 


6 


6 


7 


• 10 


.401 


.443 


.470 


• SOB 


.561 


• 611 


.630 


.660 


.702 




4 


4 


5 


5 


6 


6 


6- 


7 


7 


• 01 


.453 


1 499 


• 536 


.567 


.621 


.671 


.696 


.723 


.753 




5 


5 


5 


6 


6 


7 


7 


7 


P, 


ERROR 


alpha/beta values 
















.001 ALPHA 


.007 


.01* 


• 020 


.027 


• 041 


.05* 


.061 . 


060 


.074 


BET A 


.074 


.068 


• 061 


.054 


• 041 


.027 


• 020 • 


014 


.007 



94 



£6 



PROBLEM IOENTIFICATION... SHIFT 2-OPTIUN 10-ITEM CONCEPT 3 TEST (2.12.) 

NUMBER OF CASES ... 12 

NUMBER OF ITEMS ... 10 



SUBJECT 

1*0. 


1 


2 


3 


I 

' 4 


T E 
5 


M 

6 


7 


8 


9 


10 


total 

SCORE 


121 


0 


0 


0 


1 


1 


1 


1 


0 


1 


0 


5*0 


\ 22 


1 


0 


0 


1 


1 


1 


0 


1 


0 


1 


6*0 


311 


1 


0 


i 


1 


0 


1 


1 


1 


1 


0 


7,0 


312 


1 


1 


0 


0 


1 


1 


1 


0 


0 


1 


6*0 


301 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


10*0 


30? 


1 


1 


1 


0 


0 


0 


0 


0 


0 


0 


3,0 


125 


0 


1 


1 


1 


1 


1 


0 


1 


1 


0 


7,0 


126 


0 


0 


0 


1 


1 


1 


0 


0 


0 


1 


4.0 


315 


0 


1 


1 


0 


1 


0 


0 


1 


1 


0 


5.0 


316 


1 


0 


1 


1 


1 


1 


1 


0 


0 


0 


6.0 


305 


1 


0 


0 


0 


0 


1 


1 


1 


1 


0 


5.0 


306 


0 


1 


1 


1 


1 


1 


1 


0 


1 


0 


7.0 


PERCENT 

PASS 


SB 


50 


5B 


67 


75 


03 


56 


50 


SB 


33 





TEST MEAN 5.92 

STANOARD DEVIATION . , 1,782 

RFL I ABILITY (KR20) . . .32ofl 

STANOARO ERROR .... 1,4627 

AVERAGE ITEM R .... .0461 



TEST COT RULES 







ALPHA 


TO BETA 


RATIO 












err wt 


10.000 


5.000 


3.00(1 


2.000 


1.000 


.500 


.330 


.200 


.100 


100.00 


-.073 


-.ftbo 


-.069 


-.042 


.000 


.226 


.341 


,466 


,608 




-0 


-0 


-0 


0 


1 


2 


3 


5 


6 


10.00 


.043 


.073 


. 1 1 3 


.161 


. 2*0 


.432 


,523 


,621 


.725 




0 


1 


1 


2 


3 


4 


S 


6 


7 


1 .00 


.159 


• 226 


.296 


,365 


.500 


,635 


.70S 


.774 


,641 




2 


? 


3 


4 


5 


6 


7 


B * 


R 


.10 


.275 


.370 


.479 


,566 


.720 


,839 


.808 


.9?7 


,957 




3 


4 


S 


6 


7 


8 


9 


9 


10 


.01 


.392 


.53? 


• 662 


.77? 


.94o 


1.04? 


1.070 1 


.OBO 


1.073 




4 


5 


7 


8 


9 


10 


11 


11 


11 


ERROR 


ALPHA/flETA values 














,785 ALPHA 


.071 


.131 


.196 


,262 


.393 


.523 


• S90 


654 


.714 


BETA 


.714 


. 654 


.589 


.523 


.393 


.262 


. 19S 


131 


.071 



PROBLEM IDENTIFICATION,. .SHIFT 4-0PTIQN 10-ITCM CONCEPT l TEST <2.22. > 

NUMBER OF CASES ... 6 

number OF ITEMS • . • 10 

TOTAL 
8 9 10 SCORE 

1 1 1 10.0 

000 2*0 

1 1 M 10.0 

1 0 0 2.0 

1 11 10.0 

1 1 1 10.0 

PERCENT 

PASS 67 67 83 83 67 83 67 83 67 67 

TEST MEAN .*••♦•. T, 33 

STANOARO DEVIATION . . 4,131 

RELIABILITY (KR20) , . .9881 

Vanoard ERROR . . ♦ ♦ .4500 

AVERAGE ITEM R 



TEST CUT 


rules 


ALPHA 


TO 3ETA 


RATIO 












ERR WT 


10.000 


5.000 


9.000 


2.000 


l.oon 


.500 


.330 


,200 


.100 


100.00 


.270 


.299 


.324 


• 348 


.392 


.441 


,469 


,502 


.544 




3 


3 


3 


3 


4 , 


4 


5 


5 


5 


10.00 


.316 


.348 


.378 


• 400 


,446 


.494 


,521 


.552 


.590 




3 


3 


4 


4 


4 


5 


5 


6 


6 


1,00 — 


, .36J 


.398 


,428 


.453 


.500 


.547 


.573 


.602 


,637 




4 


4- 


4 


5 


5 


5 


6 


6 


6 


• 10 


.410 


• 448 


.480 


.506 


.554 


• 600 


,625 


.652 


,684 




4 


4 


5 


5 


6 


6 


6 


7 


7 


• 01 


.456 


.498 


.531 


.559 


.60« 


• 652 


.676 


.701 


,730 




5 


5 


5 


6 


6 


7 


7 


7 


7 


ERROR 


ALPHA/BETA VALUES 














,055 ALPHA 


i .005 


• 009 


.014 


• 018 


.028 


.037 


.0*1 


.046 


,050 


BETA 


i .050 


• 046 


.041 


.037 


.028 


• 018 , 


.01* 


,009 


,005 



SUBJECT ITEM 

1 , 0 . 1 2 3 4 5 6 7 

111 1 1111 11 

112 0 0 1 1 0 0 0 

141 1111111 

142 0 0 0 0 0 1 0 

221 1 1 11 11 1 

222 1 1 1 1 1 1 1 



problem identification# ••shift a-opiion io-item concept 2 test ( 2 . 22 .) 

NUMBEP OF CASES ... 6 ‘ - 

NUMBER of items • . . 10 



SUBJECT 

I.D. 


1 


2 


3 


1 T 
4 


E 

5 


M 

6 


7 


8 


9 


10 


TOTAL 

SCORE 


Ill 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


10.0 


11? 


1 


0 


0 


0 


0 


0 


0 


1 


1 


0 


3*0 


141 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


10.0 


14? 


1 


0 


0 


0 


0 


1 


0 


0 


1 


0 


3,0 


221 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


10.0 


222 


1 


1 


1 


1 


1 


0 


1 


1 


1 


1 


9,0 


PERCENT 


PARS 


100 


67 


67 


67 


67 


67 


67 


%2> 

W 

O 

O 


67 




TEST MEAN 


• • 


• . 


• • 


• 


• 


7. 


50 









STANDARD DEVIATION • . 3,50? 

RELIABILITY (KR20) • • .9580 

STANDARD ERROR . • • • .7184 

AVERAGE ITEM R • . • • .6954 



TEST CUT 
ERR WT 


rules 

10.000 


ALPHA 

5.000 


TO BETA 
J.000 


RATIO 

2.000 


1.000 


• 500 


.330 


.200 


.100 


100.00 


• 108 


.219 


• 250 


• 280 


.340 


.408 


.449 


.496 


.552 


2 


2 


3 


3 


3 


4 


4 


5 


6 


10.00 


.253 


.291 


• 326 


.35R 


.42n 


.486 


.525 


,567 


.617 




3 


3 


3 


4 


4 


5 


5 


6 


6 


1.00 


• 318 


.362 


• 401 


.436 


,500 


.564 


.600 


,63B 


.682 


3 


4 


4 


4 


5 


6 


4 


6 


7 


.10 


.383 


.433 


.476 


.514 


,5»r 


.642 


.676 


.709 


.747 


4 


4 


5 


5 


6 


6 


7 


7 


7 


• 01 


.448 


• 504 


.552 


.592 


,660 


.720 


.751 


.7B1 


.812 


4 


5 


6 


6 


7 


7 


A 


8 


8 


error alpha/beta values 

.166 ALPHA .015 .028 .0*2 


.055 


.083 


• 111 


.125 


.138 


.151 


BEVA .151 


• 138 


.125 


• 111 


.083 


• 055 


.041 


.028 


.015 



PROBLEM IDENTIFICATION. ..SHIFT . 4-OPTION 1 u-ITEM CONCEPT 3 TEST (2,22.) 

tJUMRFR OF CASES ... 12 

NUMBER OF ITEMS ... 10 



SUBJECT 

I.D, 


1 


2 


3 


I T 
4 


E 

5 


M 

6 


7 


8 


9 


10 


TOTAL 

SCORE 


in 


0 


0 


1 


1 


1 


1 


1 


1 


0 


1 


7.0 


112 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


no 


141 


0 


0 


0 


1 


0 


0 


0 


0 


0, 


0 


no 


14? 


0 


n 


0 


0 


0 


0 


0 


0 


0 


0 


0.0 


221 


1 


0 


1 


0 


1 


1 


0 


0 


0 


1 


5.0 


222 


n 


0 


0 


0 


0 


0 


0 


l 


1 


0 


2,0 


115 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0.0 


116 


l 


l 


0 


0 


0 


1 


1 


1 


1 


1 


7.0 


145 


0 


l 


0 


0 


0 


1 


1 


1 


l 


1 


6.0 


146 


o 


l 


0 


0 


0 


0 


0 


0 


0 


1 


2*0 


225 


1 


0 


0 


0 


0 


0 


1 


0 


n 


1 


3.0 


226 


0 


0 


0 


0 


n 


0 


0 


0 


0 


0 


0.0 


PERCENT 


PASS 


25 


25 


25 


17 


17 


33 


33 


33 


25 


50 





TEST mean 2.93 

STAND4RO DEVIATION , . 2.725 

RELIABILITY (KR20) . . ,9201 

STANDARD ERROR .... 1,1557 

AVERAGE ITEM R 3131 



o 

ERIC 



TEST CUT RULES 

ALPHA TO BETA RATIO 



ERR WT 


10,000 


5,000 3 


• ooo 


2.000 


l.oon 


• 500 


,33n 


.200 


• 100 


ino.oo 


• 137 


• 176 


.217 


• 259 


,348 


• 450 


,510 


• 575 


.649 




1 


2 


2 


3 


3 


5 


5 


6 


6 


lo.no 


.191 


• 238 


• 286 


.33? 


.424 


.523 


,578 


.637 


• 702 




2 


2 


3 


3 


4 


5 


6 


6 


7 


noo 


• 244 


• 301 


• 354 


• 404 


,50n 


• 596 


,647 


,699 


.756 




2 


3 


4 


4 


5 


6 


6 


7 


8 


• in 


.298 


• 363 


• 423 


• 477 


.576 


,668 


• 715 


.762 


.809 




3 


4 


4 


5 


b 


7 


7 


8 


8 


• ol 


• 351 


.425 


• 492 


• 55 4 


,6»2 


• 741 


,784 


,624 


.363 




4 


4 


5 


5 


7 


7 


8 


8 


9 


ERROR 


alpha/beta values 
















,44n ALPHA 


• 040 


.073 . 


no 


• 167 


.220 


.294 


.331 


• 367 


,400 


BETA 


• 400 


.367 


330 


• 294 


.220 


.147 


.109 


.073 


• 040 



ICO 98 



